Analysis and Forecast of Start-Up Companies

Introduction

Crunchbase released a dataset detailing approximately 18,500 Startups since 1920. The dataset contains a number of interesting features including startup locations, rounds of investments by type, early investors, monetary value of investments, company status and outcome. We used the dataset that was published on November 4th, 2013 and our central assumption made is that the companies’ funding information is reliable and trustworthy. We recognize that identifying a promising startup can be a tricky problem, because the most innovative startups tend to disrupt existing markets. However, this dataset allows us to “follow the money” and gain insights into what a successful Startup looks like, with a focus towards how an investor could exploit this knowledge for monetary gain. We follow the startups all the way to their IPO and beyond into the stock market to see which companies that came from humble beginnings transformed themselves into large and financially stable companies ruling the market today. For the IPO part, We used the stock information from Yahoo Finance to get the information about the public companies in our dataset.

Trends in Investments

While working on the time series dataset, we chose to look at two important factors: total investment and number of startup companies in a given year. These two factors are important because they can be indicative of the general state of the economy in the United States as well as the success of the startup companies. The Figure below provides detailed information about a pattern of investment since 1987.

Figure 1 The plot above shows the number of investment in United States from 1987 to 2013. The color filling indicates the number of Startup companies in that year. The six vertical red dotted lines represent important time events.

Figure 2 The heat map shows the temporal trend of different types of industry. ( Note that “Cleantech” industry refers to companies that look to be innovative in the area of biofuels, electric vehicles, solar panels and advanced nuclear technologies.)

One interesting thing to notice is that total investment rose briefly around when Google started in 1998. However, shortly after, investment declined in reaction to the implosion of the market following the bursting of the dot-com bubble or internet bubble. Then after 2004 America experienced an overall increase in investment, which coincides with when Facebook was founded. We may suggest the investors were in search of the next Google leading to an exponential increase in the investment. One possible reason for the swift increase of investment could be the increased popularity of Internet usage for social networking and marketing through sites like Orkut, Linkedin and small online businesses. From year 2005, the slope of investment decreased, since the U.S. housing bubble bursted and peaked in 2006. This was the time when the loans and investment from financial institutions decreased. That interbank credit crisis led to a 23% decrease in total investment.

The software companies seemed less affected by the financial recession however, since the number of software and mobile startup companies actually still increased ( shown in Figure 2 ). One explanation could be the ascent of smart phones like the iPhone and Android phones which created entirely new markets for app development and mobile applications. Total investment remained rather low from 2009 to 2010 and then it revived as the economic conditions improved. After investment reached its highest point in 2011, the graph again shows a decrease in seed investments and number of new startups founded. We may say that the investors are struggling to attract the best founders and make seed investments in promising companies. Also, we can say that most young accelerators and incubators seem destined to fail because of the overcrowded market for early stage funding. It appears that the market may have found a saturation point; in the past, investment tended to increase whenever a new market emerged, but there hasn’t been an entirely new market created due to technological advancement since the mobile app market emerged. Should another new technology prove to be transformative such that it creates more room for startups, then we could assume another spike in investments and number of startups founded.

Geographical Trends

The dataset contains 42 categories of companies, spread across 601 locations. The map in Figure 3 shows the state-by-state distribution of total investment dollars invested. Massachusetts, California, Texas and Washington are the top states for startups, since investors in those four states collectively raise the most money for new companies. Also note that in addition to a strong presence in the software industry, Massachusetts and SF Bay are inundated with Biotech companies as well. California has a large amount of small and diverse biotech companies whereas Massachusetts has a more focused segment of the Biotech industry, with most companies working in the field of drug discovery and development. Therefore, the investment is large as compared to other states. Texas and Washington also raise very high dollar amounts due to their advantageous-to-business tax environments – most notably, the lack of a state personal or corporate income tax.

As one might expect, most of the startups in our data set are clustered around major cities. However, we must be careful when making assumptions from this map, because our data set is biased towards companies that received some form of venture funding. Thus, there may be many startups in other places that did not receive funds, so they are not represented here.

Figure 3 The maps shows the geographic location of the investment and investment states received from 1987 to 2013. We can see the Top 10 Startup companies based on investment grouped into regions and by sum of investment received by each state.

Figure 4 the Heat Map Shows the Geographical Trend of Different Types of Industry.

The heat map in Figure 4 shows the top ten regions for startups and how they represent the top ten types of industries. It shows that San Francisco (SF) Bay area, Greater New York, Greater Los Angeles and Boston are the most popular hubs for startup companies of virtually every top industry. The software industry especially is omnipresent in all the ten region, with SF Bay area not surprisingly being the most popular one. The Biotech industry is dominant in SF Bay, Boston, Greater Los Angeles and San Diego, Washington DC and Seattle. We believe this could be due to the presence of elite universities and research institutes like Harvard and MIT in the Boston Area, Stanford and UC Berkeley around the SF Bay area, the University of Washington in Seattle, and the National Institute of Health and John Hopkins University around the DC area.

Different areas seem to engender different types of startups. New York has a strong showing in the advertising, web and software industries, but seems bizarrely lacking in Biotech and Cleantech startups one might expect to find in such a large and investor-dense region. Boston also has a noteworthy amount of  “enterprise” companies, which are geared more towards facilitating business functions rather than selling products or services to consumers. This could help explain the region’s remarkably dominant software industry, which rely a lot on the business-to-business types of services.

Type of Investment

The type of funding is an important criteria for the entrepreneurs to support the early costs associated with starting a business. Series A refers to the first round of stock offered to investors during early-stage rounds. Typical Series A rounds fall in the range of  $2-5M, with the offer options for 20-40% of the company, and are intended to support a company through the early stages of building a business, from product development to hiring to marketing. Series B refers to second-stage financing. Series C/C+ funding is used as the companies grows, they might continue to seek additional funds to meet future milestones for example Facebook and Twitter got the Series C+ funding. Angel Investors and Venture Capitalists are either rich individuals or small company that provides small seed funding for the new start up industry. They may also buy the stock of the public company in exchange of convertible debt, for example Warren Buffet bought stocks in Goldman Sachs during the recession period. Equity funding or crowd-funding is an umbrella term that refers to any means of financing your company in which you receive money in exchange for issuing shares of your stock.

Figure 5 Type, Rounds and Amount of Investment in Various Types of Industry.

The Figure 5 shows an obvious trend of the Financial institutions investing more than the individuals. Individual Investors are more interested in providing funds to the software, mobile and web as they are interested in immediate gains as compared to financial institutions. It is easily noticeable that the distributions of total rounds of Series A/B/C+ or Venture funding is more in case of software than biotech, although the total investments in biotech industry is more than software. This is obvious because the number of years to develop a product takes a lot of time and money in case of biotech industries compared to software products. This also suggests that — although the risk is high in biotech industry, the investors are interested in investing because the return of investment is very high as compared to software industry. For example: if a biotech company comes up with a drug that cures cancer or tuberculosis or a successful diagnostic product then percentage of profit is definitely higher than software industries.

Performance of Investors

We were curious about most important investors, and so we took a look at how successful different investors are at picking startups that will either be acquired or IPO or shut down. To do this, we looked at companies in the data set that were not currently operating and instead focused on those which had a definite end, be it a buyout, going public, or bankruptcy. Since we could not obtain data that could give us the insights on how much of a company was sold during a merger or acquisition or the amount of shares that a particular investor sold during the IPO, the link between fund outcomes and financial performance is tenuous. For example, in case of Google and Apple, an investor would have gained a lot if they would have kept its shares rather than selling them whereas, on the other hand in case of Zynga, a famous online video game company, the shares plummeted 4 times lower than the initial IPO stock price.

Figure 6 the Histogram Shows the Success of Top 20 Investors.

The Figure 6 suggests the top investors are Sequoia Capital, SV Angel and New Enterprise Accessories as they have the highest success rate. It is interesting to note that for most companies the top investors choose to fund, they end up getting acquired, and have a rather small chance of achieving an initial public offering. This seems to reinforce the trend that many startups prefer leveraging their success into a buyout and avoid the uncertainty of going public. Furthermore, due to the rather unpredictable nature of venture capital, it is likely that investors prefer a “sure thing” like an acquisition to a company that IPO’s. It’s important to note that likely the “Closed” category is heavily underrepresented in the data set which hints towards the case of “survivorship bias” in the data set. Survivorship bias is the logical error of concentrating on the companies that “survived” and inadvertently overlooking those that did not leading to lack of visibility. Since, the companies and individuals don’t like to put up their failed attempts on display it is easy to neglect the failures. Indeed, finding data on failed companies proved to be a monumentally difficult task when cleaning the data set. Another interesting thing to point out is KPCB’s remarkably high IPO rate; indeed you were more likely to go public if KPCB invested than any other investor.

Predicting a Start-up’s Success

We explored building a generalized linear model to see if we could predict whether a start-up will succeed (IPO/get acquired) or fail (close or shut down). We fit a regularized logistic regression model to the data that utilized predictive features such as Company category, geographical region, rounds of funding, total funding and the year in which the start up was founded. Using this model, we can try to make predictions about which companies in the dataset that are still operational are likely to succeed and which ones we think may shut down. Note that this may not be a fair measure of prediction because younger companies may not have enough time to build up their feature vectors. Therefore, we did analysis using the companies founded between 1997 and 2011. We have a mix of continuous numerical and categorical features. The company category, region, and the founding year were categorical variables. Categorical features are converted to numerical using the standard trick or expanding the features set by transforming a categorical feature with K possible values, into K features each with a binary value.

Company Type Region Found Success
SurveyMonkey software SF Bay 1999 0.99
Sigmacare health New York 2005 0.99
ZeniMax games video Washington DC 1999 0.98
Twitter social SF Bay 2006 0.97
Bloom Energy cleantech SF Bay 2002 0.97
Fisker automotive Los Angeles 2008 0.97
Pinterest social SF Bay 2009 0.96
Solyndra manufacturing SF Bay 2005 0.96
LivingSocial ecommerce Washington DC 2007 0.95
GreatPoint En cleantech Boston 2004 0.93

Table 1 Predicting the Successful Companies.

Company Type Region Found Failure
JumpTheClub mobile Hartford 2010 0.99
PureHistory search Somerset 2011 0.99
Striped Sail hardware Champaign 2010 0.99
Apps Genius games video Red Bank 2009 0.99
Energy Web ecommerce Allentown 2004 0.99
GlucoSentient biotech Champaign 2011 0.99
ANDalyse other Champaign 2005 0.99
Forcura biotech Jaksonville 2010 0.99
Southtree web Chattanooga 2009 0.99
PrintEco software Champaign 2010 0.99

Table 2 Predicting the Shutdown Companies.

The most predictive features were total funding, category of the industry esp. Biotech, Cleantech, Software, Video Games and Hardware and region. The results in Table 1 predicts the following company has a great chance of being acquired or getting IPOed whereas, Table 2 suggests the chance of the company getting shutdown. The column Success or Failure suggests the probability of success or failure (1-Success) respectively. An interesting result was the prediction of Twitter’s IPO which occurred on the 6th November (and the data set was published on 4th Nov. 2013). We were pleasantly surprised by the accuracy of our results, but the model is still not perfect. Another company it predicts as being highly likely to have an IPO is Solyndra, which famously went bankrupt amid scandal concerning accusations of fraud and misrepresenting corporate finances to obtain government funding. It is an important caution to not take the predictions made by the linear model by face value alone. Still, we believe that even this failure is positive, because our model is mostly predicated upon amount of funds raised; Solyndra had a management dispute that couldn’t be predicted by any of the important features in the model. An argument could be made the the case of Solyndra was anomalous. Another thing to note is the model was trained upon a much smaller training set than its test predictions set, so a reasonable amount of over fitting may have taken place. Furthermore, because the model was trained to predict success, it does not do quite as good of a job predicting failures, since the features that predict success are not necessarily the same ones that predict failure. The model is likely unfit for any true prediction, but nevertheless performed admirably enough to make a case for which features are indeed the most predictive when looking at early investment in start-up companies.

Post-IPO

We wanted to look at the companies in our dataset that had an IPO to see if we could identify any notable trends. To do this, it first was necessary to discover the companies’ ticker symbols. We built a function that called Yahoo! Finance‘s search function using the company name as an input. The function returned the first search result. However, due to the fact that the dataset has only information about companies in their startup phase, we found that the search function didn’t operate properly for many companies that had gone through significant changes like mergers, post-IPO bankruptcies, buyouts, name changes, and ticker symbol changes. Experimenting with other websites’ search functions had the same issue. To overcome this, we fed the function a dummy value for those companies for which the search function failed to return the correct symbol and found the missing symbol manually. Because running the find_symbol function queries a website, it runs very slowly; to make our code run easier we saved the symbols in a file.

In the base data set, 377 companies are listed as having an IPO. After running the find_symbol function, we found symbols for 332 active companies. We chose to exclude companies that were bought out, had a merger, or went bankrupt because we wanted to focus on how the companies were performing currently and finding historical stock data for companies that are no longer in existence or underwent some significant change proved to be very difficult.

We used these symbols to query Yahoo! Finance using a function called getKeyStats_xpath, which scrapes a company profile page for any relevant valuation statistics. This function returned many, many interesting features like Market Cap, Revenue, EBITDA, etc. We restricted our analyses to the following features because we find them to be more reliable and had a minimal number of NA values: Market Cap, Enterprise Value, Enterprise Value/Revenue, Enterprise Value/EBITDA, Revenue, Revenue Per Share, Gross Profit, EBITDA , Total Cash , Total Cash Per Share, 52-Week Change , Shares Outstanding, Float, %Held by Insiders , PEG Ratio.

Company Market Cap Enterprise Value Revenue EBITDA* Total Cash
Verizon 141 186 120 34 57
Google 354 297 57 18 55
Raytheon 27 29 24 3 4
Lockheed Martin 43 49 46 5 3
Texas Instru 46 48 12 4 4
Honeywell 68 72 38 6 6
Xerox 14 20 22 3 1
News Corp 10 7 9 1 3
General Elec 270 652 145 29 10
Quintiles 5 7 4 1 1

Table 3 Top 10 IPO Companies.

Breaking down the percentage change within the last 52 weeks by industry, we can get a rough picture of which industries are booming, busting, predictable, or volatile. The Figure 7 are the top industries identified earlier plus any industry with more than 15 companies represented in the data set. We can see that biotech, software, and cleantech have higher variances, making investing in those three industries high-risk, high-reward. Surprisingly, it appears that the safest industry for company growth currently is video-games, with an average share price increase of over fifty percent. The worst performing industries in terms of growth were advertising and e-commerce. Note that this graphic must be interpreted with skepticism, because a number of factors can affect share price growth. For example, the youngest companies tend to have much higher absolute values of percentage growth because they’re smaller, so industries with more young companies may have inflated figures. Furthermore, because this data excludes companies that failed post-IPO, had a merger, or were bought out, one cannot make inferences about these possibilities, which is likely what an investor looking at percentage growth may want to do.

Figure 7 the Histogram Shows the Success of Top 20 Industries.

Another interesting thing to look at was the comparison between Enterprise Value (EV) /Revenue (R) and Enterprise Value/EBITDA. These two ratios are popular for evaluating the value of the company. EV/R is a measure of the company’s ability to generate funds, while EV/EBITDA is a more direct representation of the company’s profits. We expected to see these numbers highly correlated, but found instead that they do not to a very high degree. A good example for how this can be is the medical industry; they have an average EV/R multiple of around 10, which seems high, however, the EV/EBITDA was around -2.7; this suggests that medical companies generate high revenue but do not enjoy very large profit margins. However, one must remember that there are many possible scenarios for any one ratio value, so not much weight can be given to these stats alone. Furthermore, it is generally considered meaningless to compare the investment multiples across industries, instead it is better to use them to compare similar companies. Finally, one must remember that the number of companies being compared per industry is very low, so a very large or a very small company could skew the industry average by a significant amount.

We also wanted to identify the“best” companies that are currently being publicly traded in our data set. To do this, we ranked every company in a number of key metrics, and then calculated an aggregate rank score from those rankings. We ranked the companies in the following metrics: Market Cap, Enterprise Value, Revenue, Revenue per share, Gross Profit, EBITDA, Total Cash, and PEG ratio.

This algorithm identified the top ten public companies in our data set as shown in Table 3. Not surprisingly, nearly all of these company names are well known, and they are all very large and stable companies. These companies would be a good “safe bet” investment. However, investment in these companies is not guaranteed to return at very high levels, because the algorithm’s calculations are based mostly on metrics that indicate a company’s total size, not its growth.

Conclusions

Startups are essential for ensuring a competitive marketplace, and they are the front lines of innovation in both well-established industries and emerging new markets. The USA’s ability to foster the growth of fledgling startups and support the smaller, exciting companies with promise is a significant strength of its economy. The explosion of the software industry and the subsequent related markets like web, mobile, social networking, etc. was made possible by the growing number of investors trying to get in on the next big company from an early phase and the ever-enlarging pool of investment dollars available for early startup funding.

A number of factors influence startup funding, and we found that startup funding fluctuates with the economy to a large degree. Furthermore, startups with a connection to San Francisco, New York, Los Angeles, Boston, or any other major investment center were much more likely to succeed. Indeed, we confirmed that most startup funding occurs around major metropolitan areas, as one might expect.

Certain industries have become much more prominent for startup investment over the years. For the last decade, software has been the undisputed giant, with more software companies being founded yearly than any other industry. Biotech also has seen steady growth, and more recently, web, mobile apps, and enterprise new companies have become more common. Some industries sectors like Software are certainly more “safer” than others like Biotech, Cleantech and therefore, the chance of success can indeed be greatly dependent upon industry.

Our model was solely based to give an insight to an investor and its future investments. We found that the most predictive features of a startup’s success are the amount of money received, region, industry, and year of founding. Using simply these features, one can predict with reasonable certainty which companies will go public or be acquired – one can assume investor confidence in those companies that receive more funding. However, no model is perfect, and predicting a company’s future based solely off of external factors like funding received, location, etc. cannot be all-encompassing since it fails to take into account the strength of a companies internal qualities like employee ability, quality of product or service, etc.

Still, investors do a remarkable job of not wasting their money. The top 20 investors had very good success rates across the board, but it was interesting to note that most of them preferred an acquisition to an IPO. Very few companies manage to reach the IPO stage, and to get there one must reject many opportunities to cash in, which is not always in an investor’s best interest. But, the low shut down rates definitely makes us skeptic about the survivorship bias.

Looking post-IPO, we tried to identify some factors that indicate a stable company, and look at which companies in our data set that began as startups managed to reach success. Big names like Verizon and Google dominate the list as one might expect, but perhaps a more interesting analysis was to look at industries and their growth; we found that the emerging industries with more startups were also those with much more unpredictable spread of a 52-week percent change in share price. Certain industries, like video gaming, appeared to offer more sure investment while other like advertisement seemed to only plummet post-IPO. This is interesting when analyzed in conjunction with the remarkably high rate of acquisition by advertising companies – perhaps advertising startups know that success in the market for their industry is less likely, so are more willing to cash out.

We believe we have painted an accurate and interesting picture of the startup landscape in the USA for the past 20 years. Startups drive the economy, and there seems to be an explosion in their prevalence after every new market emerges. The future holds promise and uncertainty for startups in the USA, but by analyzing their funding, we can make smart educated guesses on their future success and look to capitalize on smart investments.

About

This report is a joint project by Aayush Raman (Baylor College of Medicine and the University of Texas, MD Anderson Cancer Center), Yinsen Miao and Gabriel Rubio Breternitz (Department of Statistics, Rice University). Full pdf copy of the report and related R script can be acquired based on request. Email Me

The pdf form report can be downloaded here.

Introduction

Crunchbase released a dataset detailing approximately 18,500 Startups since 1920. The dataset contains a number of interesting features including startup locations, rounds of investments by type, early investors, monetary value of investments, company status and outcome. We used the dataset that was published on November 4th, 2013 and our central assumption made is that the companies’ funding information is reliable and trustworthy. We recognize that identifying a promising startup can be a tricky problem, because the most innovative startups tend to disrupt existing markets. However, this dataset allows us to “follow the money” and gain insights into what a successful Startup looks like, with a focus towards how an investor could exploit this knowledge for monetary gain. We follow the startups all the way to their IPO and beyond into the stock market to see which companies that came from humble beginnings transformed themselves into large and financially stable companies ruling the market today. For the IPO part, We used the stock information from Yahoo Finance to get the information about the public companies in our dataset.

Trends in Investments

While working on the time series dataset, we chose to look at two important factors: total investment and number of startup companies in a given year. These two factors are important because they can be indicative of the general state of the economy in the United States as well as the success of the startup companies. The Figure below provides detailed information about a pattern of investment since 1987.

Figure 1 The plot above shows the number of investment in United States from 1987 to 2013. The color filling indicates the number of Startup companies in that year. The six vertical red dotted lines represent important time events.

Figure 2 The heat map shows the temporal trend of different types of industry. ( Note that “Cleantech” industry refers to companies that look to be innovative in the area of biofuels, electric vehicles, solar panels and advanced nuclear technologies.)

One interesting thing to notice is that total investment rose briefly around when Google started in 1998. However, shortly after, investment declined in reaction to the implosion of the market following the bursting of the dot-com bubble or internet bubble. Then after 2004 America experienced an overall increase in investment, which coincides with when Facebook was founded. We may suggest the investors were in search of the next Google leading to an exponential increase in the investment. One possible reason for the swift increase of investment could be the increased popularity of Internet usage for social networking and marketing through sites like Orkut, Linkedin and small online businesses. From year 2005, the slope of investment decreased, since the U.S. housing bubble bursted and peaked in 2006. This was the time when the loans and investment from financial institutions decreased. That interbank credit crisis led to a 23% decrease in total investment.

The software companies seemed less affected by the financial recession however, since the number of software and mobile startup companies actually still increased ( shown in Figure 2 ). One explanation could be the ascent of smart phones like the iPhone and Android phones which created entirely new markets for app development and mobile applications. Total investment remained rather low from 2009 to 2010 and then it revived as the economic conditions improved. After investment reached its highest point in 2011, the graph again shows a decrease in seed investments and number of new startups founded. We may say that the investors are struggling to attract the best founders and make seed investments in promising companies. Also, we can say that most young accelerators and incubators seem destined to fail because of the overcrowded market for early stage funding. It appears that the market may have found a saturation point; in the past, investment tended to increase whenever a new market emerged, but there hasn’t been an entirely new market created due to technological advancement since the mobile app market emerged. Should another new technology prove to be transformative such that it creates more room for startups, then we could assume another spike in investments and number of startups founded.

Geographical Trends

The dataset contains 42 categories of companies, spread across 601 locations. The map in Figure 3 shows the state-by-state distribution of total investment dollars invested. Massachusetts, California, Texas and Washington are the top states for startups, since investors in those four states collectively raise the most money for new companies. Also note that in addition to a strong presence in the software industry, Massachusetts and SF Bay are inundated with Biotech companies as well. California has a large amount of small and diverse biotech companies whereas Massachusetts has a more focused segment of the Biotech industry, with most companies working in the field of drug discovery and development. Therefore, the investment is large as compared to other states. Texas and Washington also raise very high dollar amounts due to their advantageous-to-business tax environments – most notably, the lack of a state personal or corporate income tax.

As one might expect, most of the startups in our data set are clustered around major cities. However, we must be careful when making assumptions from this map, because our data set is biased towards companies that received some form of venture funding. Thus, there may be many startups in other places that did not receive funds, so they are not represented here.

Figure 3 The maps shows the geographic location of the investment and investment states received from 1987 to 2013. We can see the Top 10 Startup companies based on investment grouped into regions and by sum of investment received by each state.

Figure 4 the Heat Map Shows the Geographical Trend of Different Types of Industry.

The heat map in Figure 4 shows the top ten regions for startups and how they represent the top ten types of industries. It shows that San Francisco (SF) Bay area, Greater New York, Greater Los Angeles and Boston are the most popular hubs for startup companies of virtually every top industry. The software industry especially is omnipresent in all the ten region, with SF Bay area not surprisingly being the most popular one. The Biotech industry is dominant in SF Bay, Boston, Greater Los Angeles and San Diego, Washington DC and Seattle. We believe this could be due to the presence of elite universities and research institutes like Harvard and MIT in the Boston Area, Stanford and UC Berkeley around the SF Bay area, the University of Washington in Seattle, and the National Institute of Health and John Hopkins University around the DC area.

Different areas seem to engender different types of startups. New York has a strong showing in the advertising, web and software industries, but seems bizarrely lacking in Biotech and Cleantech startups one might expect to find in such a large and investor-dense region. Boston also has a noteworthy amount of  “enterprise” companies, which are geared more towards facilitating business functions rather than selling products or services to consumers. This could help explain the region’s remarkably dominant software industry, which rely a lot on the business-to-business types of services.

Type of Investment

The type of funding is an important criteria for the entrepreneurs to support the early costs associated with starting a business. Series A refers to the first round of stock offered to investors during early-stage rounds. Typical Series A rounds fall in the range of  $2-5M, with the offer options for 20-40% of the company, and are intended to support a company through the early stages of building a business, from product development to hiring to marketing. Series B refers to second-stage financing. Series C/C+ funding is used as the companies grows, they might continue to seek additional funds to meet future milestones for example Facebook and Twitter got the Series C+ funding. Angel Investors and Venture Capitalists are either rich individuals or small company that provides small seed funding for the new start up industry. They may also buy the stock of the public company in exchange of convertible debt, for example Warren Buffet bought stocks in Goldman Sachs during the recession period. Equity funding or crowd-funding is an umbrella term that refers to any means of financing your company in which you receive money in exchange for issuing shares of your stock.

Figure 5 Type, Rounds and Amount of Investment in Various Types of Industry.

The Figure 5 shows an obvious trend of the Financial institutions investing more than the individuals. Individual Investors are more interested in providing funds to the software, mobile and web as they are interested in immediate gains as compared to financial institutions. It is easily noticeable that the distributions of total rounds of Series A/B/C+ or Venture funding is more in case of software than biotech, although the total investments in biotech industry is more than software. This is obvious because the number of years to develop a product takes a lot of time and money in case of biotech industries compared to software products. This also suggests that — although the risk is high in biotech industry, the investors are interested in investing because the return of investment is very high as compared to software industry. For example: if a biotech company comes up with a drug that cures cancer or tuberculosis or a successful diagnostic product then percentage of profit is definitely higher than software industries.

Performance of Investors

We were curious about most important investors, and so we took a look at how successful different investors are at picking startups that will either be acquired or IPO or shut down. To do this, we looked at companies in the data set that were not currently operating and instead focused on those which had a definite end, be it a buyout, going public, or bankruptcy. Since we could not obtain data that could give us the insights on how much of a company was sold during a merger or acquisition or the amount of shares that a particular investor sold during the IPO, the link between fund outcomes and financial performance is tenuous. For example, in case of Google and Apple, an investor would have gained a lot if they would have kept its shares rather than selling them whereas, on the other hand in case of Zynga, a famous online video game company, the shares plummeted 4 times lower than the initial IPO stock price.

Figure 6 the Histogram Shows the Success of Top 20 Investors.

The Figure 6 suggests the top investors are Sequoia Capital, SV Angel and New Enterprise Accessories as they have the highest success rate. It is interesting to note that for most companies the top investors choose to fund, they end up getting acquired, and have a rather small chance of achieving an initial public offering. This seems to reinforce the trend that many startups prefer leveraging their success into a buyout and avoid the uncertainty of going public. Furthermore, due to the rather unpredictable nature of venture capital, it is likely that investors prefer a “sure thing” like an acquisition to a company that IPO’s. It’s important to note that likely the “Closed” category is heavily underrepresented in the data set which hints towards the case of “survivorship bias” in the data set. Survivorship bias is the logical error of concentrating on the companies that “survived” and inadvertently overlooking those that did not leading to lack of visibility. Since, the companies and individuals don’t like to put up their failed attempts on display it is easy to neglect the failures. Indeed, finding data on failed companies proved to be a monumentally difficult task when cleaning the data set. Another interesting thing to point out is KPCB’s remarkably high IPO rate; indeed you were more likely to go public if KPCB invested than any other investor.

Predicting a Start-up’s Success

We explored building a generalized linear model to see if we could predict whether a start-up will succeed (IPO/get acquired) or fail (close or shut down). We fit a regularized logistic regression model to the data that utilized predictive features such as Company category, geographical region, rounds of funding, total funding and the year in which the start up was founded. Using this model, we can try to make predictions about which companies in the dataset that are still operational are likely to succeed and which ones we think may shut down. Note that this may not be a fair measure of prediction because younger companies may not have enough time to build up their feature vectors. Therefore, we did analysis using the companies founded between 1997 and 2011. We have a mix of continuous numerical and categorical features. The company category, region, and the founding year were categorical variables. Categorical features are converted to numerical using the standard trick or expanding the features set by transforming a categorical feature with K possible values, into K features each with a binary value.

Company Type Region Found Success
SurveyMonkey software SF Bay 1999 0.99
Sigmacare health New York 2005 0.99
ZeniMax games video Washington DC 1999 0.98
Twitter social SF Bay 2006 0.97
Bloom Energy cleantech SF Bay 2002 0.97
Fisker automotive Los Angeles 2008 0.97
Pinterest social SF Bay 2009 0.96
Solyndra manufacturing SF Bay 2005 0.96
LivingSocial ecommerce Washington DC 2007 0.95
GreatPoint En cleantech Boston 2004 0.93

Table 1 Predicting the Successful Companies.

Company Type Region Found Failure
JumpTheClub mobile Hartford 2010 0.99
PureHistory search Somerset 2011 0.99
Striped Sail hardware Champaign 2010 0.99
Apps Genius games video Red Bank 2009 0.99
Energy Web ecommerce Allentown 2004 0.99
GlucoSentient biotech Champaign 2011 0.99
ANDalyse other Champaign 2005 0.99
Forcura biotech Jaksonville 2010 0.99
Southtree web Chattanooga 2009 0.99
PrintEco software Champaign 2010 0.99

Table 2 Predicting the Shutdown Companies.

The most predictive features were total funding, category of the industry esp. Biotech, Cleantech, Software, Video Games and Hardware and region. The results in Table 1 predicts the following company has a great chance of being acquired or getting IPOed whereas, Table 2 suggests the chance of the company getting shutdown. The column Success or Failure suggests the probability of success or failure (1-Success) respectively. An interesting result was the prediction of Twitter’s IPO which occurred on the 6th November (and the data set was published on 4th Nov. 2013). We were pleasantly surprised by the accuracy of our results, but the model is still not perfect. Another company it predicts as being highly likely to have an IPO is Solyndra, which famously went bankrupt amid scandal concerning accusations of fraud and misrepresenting corporate finances to obtain government funding. It is an important caution to not take the predictions made by the linear model by face value alone. Still, we believe that even this failure is positive, because our model is mostly predicated upon amount of funds raised; Solyndra had a management dispute that couldn’t be predicted by any of the important features in the model. An argument could be made the the case of Solyndra was anomalous. Another thing to note is the model was trained upon a much smaller training set than its test predictions set, so a reasonable amount of over fitting may have taken place. Furthermore, because the model was trained to predict success, it does not do quite as good of a job predicting failures, since the features that predict success are not necessarily the same ones that predict failure. The model is likely unfit for any true prediction, but nevertheless performed admirably enough to make a case for which features are indeed the most predictive when looking at early investment in start-up companies.

Post-IPO

We wanted to look at the companies in our dataset that had an IPO to see if we could identify any notable trends. To do this, it first was necessary to discover the companies’ ticker symbols. We built a function that called Yahoo! Finance‘s search function using the company name as an input. The function returned the first search result. However, due to the fact that the dataset has only information about companies in their startup phase, we found that the search function didn’t operate properly for many companies that had gone through significant changes like mergers, post-IPO bankruptcies, buyouts, name changes, and ticker symbol changes. Experimenting with other websites’ search functions had the same issue. To overcome this, we fed the function a dummy value for those companies for which the search function failed to return the correct symbol and found the missing symbol manually. Because running the find_symbol function queries a website, it runs very slowly; to make our code run easier we saved the symbols in a file.

In the base data set, 377 companies are listed as having an IPO. After running the find_symbol function, we found symbols for 332 active companies. We chose to exclude companies that were bought out, had a merger, or went bankrupt because we wanted to focus on how the companies were performing currently and finding historical stock data for companies that are no longer in existence or underwent some significant change proved to be very difficult.

We used these symbols to query Yahoo! Finance using a function called getKeyStats_xpath, which scrapes a company profile page for any relevant valuation statistics. This function returned many, many interesting features like Market Cap, Revenue, EBITDA, etc. We restricted our analyses to the following features because we find them to be more reliable and had a minimal number of NA values: Market Cap, Enterprise Value, Enterprise Value/Revenue, Enterprise Value/EBITDA, Revenue, Revenue Per Share, Gross Profit, EBITDA , Total Cash , Total Cash Per Share, 52-Week Change , Shares Outstanding, Float, %Held by Insiders , PEG Ratio.

Company Market Cap Enterprise Value Revenue EBITDA* Total Cash
Verizon 141 186 120 34 57
Google 354 297 57 18 55
Raytheon 27 29 24 3 4
Lockheed Martin 43 49 46 5 3
Texas Instru 46 48 12 4 4
Honeywell 68 72 38 6 6
Xerox 14 20 22 3 1
News Corp 10 7 9 1 3
General Elec 270 652 145 29 10
Quintiles 5 7 4 1 1

Table 3 Top 10 IPO Companies.

Breaking down the percentage change within the last 52 weeks by industry, we can get a rough picture of which industries are booming, busting, predictable, or volatile. The Figure 7 are the top industries identified earlier plus any industry with more than 15 companies represented in the data set. We can see that biotech, software, and cleantech have higher variances, making investing in those three industries high-risk, high-reward. Surprisingly, it appears that the safest industry for company growth currently is video-games, with an average share price increase of over fifty percent. The worst performing industries in terms of growth were advertising and e-commerce. Note that this graphic must be interpreted with skepticism, because a number of factors can affect share price growth. For example, the youngest companies tend to have much higher absolute values of percentage growth because they’re smaller, so industries with more young companies may have inflated figures. Furthermore, because this data excludes companies that failed post-IPO, had a merger, or were bought out, one cannot make inferences about these possibilities, which is likely what an investor looking at percentage growth may want to do.

Figure 7 the Histogram Shows the Success of Top 20 Industries.

Another interesting thing to look at was the comparison between Enterprise Value (EV) /Revenue (R) and Enterprise Value/EBITDA. These two ratios are popular for evaluating the value of the company. EV/R is a measure of the company’s ability to generate funds, while EV/EBITDA is a more direct representation of the company’s profits. We expected to see these numbers highly correlated, but found instead that they do not to a very high degree. A good example for how this can be is the medical industry; they have an average EV/R multiple of around 10, which seems high, however, the EV/EBITDA was around -2.7; this suggests that medical companies generate high revenue but do not enjoy very large profit margins. However, one must remember that there are many possible scenarios for any one ratio value, so not much weight can be given to these stats alone. Furthermore, it is generally considered meaningless to compare the investment multiples across industries, instead it is better to use them to compare similar companies. Finally, one must remember that the number of companies being compared per industry is very low, so a very large or a very small company could skew the industry average by a significant amount.

We also wanted to identify the“best” companies that are currently being publicly traded in our data set. To do this, we ranked every company in a number of key metrics, and then calculated an aggregate rank score from those rankings. We ranked the companies in the following metrics: Market Cap, Enterprise Value, Revenue, Revenue per share, Gross Profit, EBITDA, Total Cash, and PEG ratio.

This algorithm identified the top ten public companies in our data set as shown in Table 3. Not surprisingly, nearly all of these company names are well known, and they are all very large and stable companies. These companies would be a good “safe bet” investment. However, investment in these companies is not guaranteed to return at very high levels, because the algorithm’s calculations are based mostly on metrics that indicate a company’s total size, not its growth.

Conclusions

Startups are essential for ensuring a competitive marketplace, and they are the front lines of innovation in both well-established industries and emerging new markets. The USA’s ability to foster the growth of fledgling startups and support the smaller, exciting companies with promise is a significant strength of its economy. The explosion of the software industry and the subsequent related markets like web, mobile, social networking, etc. was made possible by the growing number of investors trying to get in on the next big company from an early phase and the ever-enlarging pool of investment dollars available for early startup funding.

A number of factors influence startup funding, and we found that startup funding fluctuates with the economy to a large degree. Furthermore, startups with a connection to San Francisco, New York, Los Angeles, Boston, or any other major investment center were much more likely to succeed. Indeed, we confirmed that most startup funding occurs around major metropolitan areas, as one might expect.

Certain industries have become much more prominent for startup investment over the years. For the last decade, software has been the undisputed giant, with more software companies being founded yearly than any other industry. Biotech also has seen steady growth, and more recently, web, mobile apps, and enterprise new companies have become more common. Some industries sectors like Software are certainly more “safer” than others like Biotech, Cleantech and therefore, the chance of success can indeed be greatly dependent upon industry.

Our model was solely based to give an insight to an investor and its future investments. We found that the most predictive features of a startup’s success are the amount of money received, region, industry, and year of founding. Using simply these features, one can predict with reasonable certainty which companies will go public or be acquired – one can assume investor confidence in those companies that receive more funding. However, no model is perfect, and predicting a company’s future based solely off of external factors like funding received, location, etc. cannot be all-encompassing since it fails to take into account the strength of a companies internal qualities like employee ability, quality of product or service, etc.

Still, investors do a remarkable job of not wasting their money. The top 20 investors had very good success rates across the board, but it was interesting to note that most of them preferred an acquisition to an IPO. Very few companies manage to reach the IPO stage, and to get there one must reject many opportunities to cash in, which is not always in an investor’s best interest. But, the low shut down rates definitely makes us skeptic about the survivorship bias.

Looking post-IPO, we tried to identify some factors that indicate a stable company, and look at which companies in our data set that began as startups managed to reach success. Big names like Verizon and Google dominate the list as one might expect, but perhaps a more interesting analysis was to look at industries and their growth; we found that the emerging industries with more startups were also those with much more unpredictable spread of a 52-week percent change in share price. Certain industries, like video gaming, appeared to offer more sure investment while other like advertisement seemed to only plummet post-IPO. This is interesting when analyzed in conjunction with the remarkably high rate of acquisition by advertising companies – perhaps advertising startups know that success in the market for their industry is less likely, so are more willing to cash out.

We believe we have painted an accurate and interesting picture of the startup landscape in the USA for the past 20 years. Startups drive the economy, and there seems to be an explosion in their prevalence after every new market emerges. The future holds promise and uncertainty for startups in the USA, but by analyzing their funding, we can make smart educated guesses on their future success and look to capitalize on smart investments.

About

This report is a joint project by Aayush Raman (Baylor College of Medicine and the University of Texas, MD Anderson Cancer Center), Yinsen Miao and Gabriel Rubio Breternitz (Department of Statistics, Rice University). Full pdf copy of the report and related R script can be acquired based on request. Email Me

Introduction

Crunchbase released a dataset detailing approximately 18,500 Startups since 1920. The dataset contains a number of interesting features including startup locations, rounds of investments by type, early investors, monetary value of investments, company status and outcome. We used the dataset that was published on November 4th, 2013 and our central assumption made is that the companies’ funding information is reliable and trustworthy. We recognize that identifying a promising startup can be a tricky problem, because the most innovative startups tend to disrupt existing markets. However, this dataset allows us to “follow the money” and gain insights into what a successful Startup looks like, with a focus towards how an investor could exploit this knowledge for monetary gain. We follow the startups all the way to their IPO and beyond into the stock market to see which companies that came from humble beginnings transformed themselves into large and financially stable companies ruling the market today. For the IPO part, We used the stock information from Yahoo Finance to get the information about the public companies in our dataset.

Trends in Investments

While working on the time series dataset, we chose to look at two important factors: total investment and number of startup companies in a given year. These two factors are important because they can be indicative of the general state of the economy in the United States as well as the success of the startup companies. The Figure below provides detailed information about a pattern of investment since 1987.

Figure 1 The plot above shows the number of investment in United States from 1987 to 2013. The color filling indicates the number of Startup companies in that year. The six vertical red dotted lines represent important time events.

Figure 2 The heat map shows the temporal trend of different types of industry. ( Note that “Cleantech” industry refers to companies that look to be innovative in the area of biofuels, electric vehicles, solar panels and advanced nuclear technologies.)

One interesting thing to notice is that total investment rose briefly around when Google started in 1998. However, shortly after, investment declined in reaction to the implosion of the market following the bursting of the dot-com bubble or internet bubble. Then after 2004 America experienced an overall increase in investment, which coincides with when Facebook was founded. We may suggest the investors were in search of the next Google leading to an exponential increase in the investment. One possible reason for the swift increase of investment could be the increased popularity of Internet usage for social networking and marketing through sites like Orkut, Linkedin and small online businesses. From year 2005, the slope of investment decreased, since the U.S. housing bubble bursted and peaked in 2006. This was the time when the loans and investment from financial institutions decreased. That interbank credit crisis led to a 23% decrease in total investment.

The software companies seemed less affected by the financial recession however, since the number of software and mobile startup companies actually still increased ( shown in Figure 2 ). One explanation could be the ascent of smart phones like the iPhone and Android phones which created entirely new markets for app development and mobile applications. Total investment remained rather low from 2009 to 2010 and then it revived as the economic conditions improved. After investment reached its highest point in 2011, the graph again shows a decrease in seed investments and number of new startups founded. We may say that the investors are struggling to attract the best founders and make seed investments in promising companies. Also, we can say that most young accelerators and incubators seem destined to fail because of the overcrowded market for early stage funding. It appears that the market may have found a saturation point; in the past, investment tended to increase whenever a new market emerged, but there hasn’t been an entirely new market created due to technological advancement since the mobile app market emerged. Should another new technology prove to be transformative such that it creates more room for startups, then we could assume another spike in investments and number of startups founded.

Geographical Trends

The dataset contains 42 categories of companies, spread across 601 locations. The map in Figure 3 shows the state-by-state distribution of total investment dollars invested. Massachusetts, California, Texas and Washington are the top states for startups, since investors in those four states collectively raise the most money for new companies. Also note that in addition to a strong presence in the software industry, Massachusetts and SF Bay are inundated with Biotech companies as well. California has a large amount of small and diverse biotech companies whereas Massachusetts has a more focused segment of the Biotech industry, with most companies working in the field of drug discovery and development. Therefore, the investment is large as compared to other states. Texas and Washington also raise very high dollar amounts due to their advantageous-to-business tax environments – most notably, the lack of a state personal or corporate income tax.

As one might expect, most of the startups in our data set are clustered around major cities. However, we must be careful when making assumptions from this map, because our data set is biased towards companies that received some form of venture funding. Thus, there may be many startups in other places that did not receive funds, so they are not represented here.

Figure 3 The maps shows the geographic location of the investment and investment states received from 1987 to 2013. We can see the Top 10 Startup companies based on investment grouped into regions and by sum of investment received by each state.

Figure 4 the Heat Map Shows the Geographical Trend of Different Types of Industry.

The heat map in Figure 4 shows the top ten regions for startups and how they represent the top ten types of industries. It shows that San Francisco (SF) Bay area, Greater New York, Greater Los Angeles and Boston are the most popular hubs for startup companies of virtually every top industry. The software industry especially is omnipresent in all the ten region, with SF Bay area not surprisingly being the most popular one. The Biotech industry is dominant in SF Bay, Boston, Greater Los Angeles and San Diego, Washington DC and Seattle. We believe this could be due to the presence of elite universities and research institutes like Harvard and MIT in the Boston Area, Stanford and UC Berkeley around the SF Bay area, the University of Washington in Seattle, and the National Institute of Health and John Hopkins University around the DC area.

Different areas seem to engender different types of startups. New York has a strong showing in the advertising, web and software industries, but seems bizarrely lacking in Biotech and Cleantech startups one might expect to find in such a large and investor-dense region. Boston also has a noteworthy amount of  “enterprise” companies, which are geared more towards facilitating business functions rather than selling products or services to consumers. This could help explain the region’s remarkably dominant software industry, which rely a lot on the business-to-business types of services.

Type of Investment

The type of funding is an important criteria for the entrepreneurs to support the early costs associated with starting a business. Series A refers to the first round of stock offered to investors during early-stage rounds. Typical Series A rounds fall in the range of  $2-5M, with the offer options for 20-40% of the company, and are intended to support a company through the early stages of building a business, from product development to hiring to marketing. Series B refers to second-stage financing. Series C/C+ funding is used as the companies grows, they might continue to seek additional funds to meet future milestones for example Facebook and Twitter got the Series C+ funding. Angel Investors and Venture Capitalists are either rich individuals or small company that provides small seed funding for the new start up industry. They may also buy the stock of the public company in exchange of convertible debt, for example Warren Buffet bought stocks in Goldman Sachs during the recession period. Equity funding or crowd-funding is an umbrella term that refers to any means of financing your company in which you receive money in exchange for issuing shares of your stock.

Figure 5 Type, Rounds and Amount of Investment in Various Types of Industry.

The Figure 5 shows an obvious trend of the Financial institutions investing more than the individuals. Individual Investors are more interested in providing funds to the software, mobile and web as they are interested in immediate gains as compared to financial institutions. It is easily noticeable that the distributions of total rounds of Series A/B/C+ or Venture funding is more in case of software than biotech, although the total investments in biotech industry is more than software. This is obvious because the number of years to develop a product takes a lot of time and money in case of biotech industries compared to software products. This also suggests that — although the risk is high in biotech industry, the investors are interested in investing because the return of investment is very high as compared to software industry. For example: if a biotech company comes up with a drug that cures cancer or tuberculosis or a successful diagnostic product then percentage of profit is definitely higher than software industries.

Performance of Investors

We were curious about most important investors, and so we took a look at how successful different investors are at picking startups that will either be acquired or IPO or shut down. To do this, we looked at companies in the data set that were not currently operating and instead focused on those which had a definite end, be it a buyout, going public, or bankruptcy. Since we could not obtain data that could give us the insights on how much of a company was sold during a merger or acquisition or the amount of shares that a particular investor sold during the IPO, the link between fund outcomes and financial performance is tenuous. For example, in case of Google and Apple, an investor would have gained a lot if they would have kept its shares rather than selling them whereas, on the other hand in case of Zynga, a famous online video game company, the shares plummeted 4 times lower than the initial IPO stock price.

Figure 6 the Histogram Shows the Success of Top 20 Investors.

The Figure 6 suggests the top investors are Sequoia Capital, SV Angel and New Enterprise Accessories as they have the highest success rate. It is interesting to note that for most companies the top investors choose to fund, they end up getting acquired, and have a rather small chance of achieving an initial public offering. This seems to reinforce the trend that many startups prefer leveraging their success into a buyout and avoid the uncertainty of going public. Furthermore, due to the rather unpredictable nature of venture capital, it is likely that investors prefer a “sure thing” like an acquisition to a company that IPO’s. It’s important to note that likely the “Closed” category is heavily underrepresented in the data set which hints towards the case of “survivorship bias” in the data set. Survivorship bias is the logical error of concentrating on the companies that “survived” and inadvertently overlooking those that did not leading to lack of visibility. Since, the companies and individuals don’t like to put up their failed attempts on display it is easy to neglect the failures. Indeed, finding data on failed companies proved to be a monumentally difficult task when cleaning the data set. Another interesting thing to point out is KPCB’s remarkably high IPO rate; indeed you were more likely to go public if KPCB invested than any other investor.

Predicting a Start-up’s Success

We explored building a generalized linear model to see if we could predict whether a start-up will succeed (IPO/get acquired) or fail (close or shut down). We fit a regularized logistic regression model to the data that utilized predictive features such as Company category, geographical region, rounds of funding, total funding and the year in which the start up was founded. Using this model, we can try to make predictions about which companies in the dataset that are still operational are likely to succeed and which ones we think may shut down. Note that this may not be a fair measure of prediction because younger companies may not have enough time to build up their feature vectors. Therefore, we did analysis using the companies founded between 1997 and 2011. We have a mix of continuous numerical and categorical features. The company category, region, and the founding year were categorical variables. Categorical features are converted to numerical using the standard trick or expanding the features set by transforming a categorical feature with K possible values, into K features each with a binary value.

Company Type Region Found Success
SurveyMonkey software SF Bay 1999 0.99
Sigmacare health New York 2005 0.99
ZeniMax games video Washington DC 1999 0.98
Twitter social SF Bay 2006 0.97
Bloom Energy cleantech SF Bay 2002 0.97
Fisker automotive Los Angeles 2008 0.97
Pinterest social SF Bay 2009 0.96
Solyndra manufacturing SF Bay 2005 0.96
LivingSocial ecommerce Washington DC 2007 0.95
GreatPoint En cleantech Boston 2004 0.93

Table 1 Predicting the Successful Companies.

Company Type Region Found Failure
JumpTheClub mobile Hartford 2010 0.99
PureHistory search Somerset 2011 0.99
Striped Sail hardware Champaign 2010 0.99
Apps Genius games video Red Bank 2009 0.99
Energy Web ecommerce Allentown 2004 0.99
GlucoSentient biotech Champaign 2011 0.99
ANDalyse other Champaign 2005 0.99
Forcura biotech Jaksonville 2010 0.99
Southtree web Chattanooga 2009 0.99
PrintEco software Champaign 2010 0.99

Table 2 Predicting the Shutdown Companies.

The most predictive features were total funding, category of the industry esp. Biotech, Cleantech, Software, Video Games and Hardware and region. The results in Table 1 predicts the following company has a great chance of being acquired or getting IPOed whereas, Table 2 suggests the chance of the company getting shutdown. The column Success or Failure suggests the probability of success or failure (1-Success) respectively. An interesting result was the prediction of Twitter’s IPO which occurred on the 6th November (and the data set was published on 4th Nov. 2013). We were pleasantly surprised by the accuracy of our results, but the model is still not perfect. Another company it predicts as being highly likely to have an IPO is Solyndra, which famously went bankrupt amid scandal concerning accusations of fraud and misrepresenting corporate finances to obtain government funding. It is an important caution to not take the predictions made by the linear model by face value alone. Still, we believe that even this failure is positive, because our model is mostly predicated upon amount of funds raised; Solyndra had a management dispute that couldn’t be predicted by any of the important features in the model. An argument could be made the the case of Solyndra was anomalous. Another thing to note is the model was trained upon a much smaller training set than its test predictions set, so a reasonable amount of over fitting may have taken place. Furthermore, because the model was trained to predict success, it does not do quite as good of a job predicting failures, since the features that predict success are not necessarily the same ones that predict failure. The model is likely unfit for any true prediction, but nevertheless performed admirably enough to make a case for which features are indeed the most predictive when looking at early investment in start-up companies.

Post-IPO

We wanted to look at the companies in our dataset that had an IPO to see if we could identify any notable trends. To do this, it first was necessary to discover the companies’ ticker symbols. We built a function that called Yahoo! Finance‘s search function using the company name as an input. The function returned the first search result. However, due to the fact that the dataset has only information about companies in their startup phase, we found that the search function didn’t operate properly for many companies that had gone through significant changes like mergers, post-IPO bankruptcies, buyouts, name changes, and ticker symbol changes. Experimenting with other websites’ search functions had the same issue. To overcome this, we fed the function a dummy value for those companies for which the search function failed to return the correct symbol and found the missing symbol manually. Because running the find_symbol function queries a website, it runs very slowly; to make our code run easier we saved the symbols in a file.

In the base data set, 377 companies are listed as having an IPO. After running the find_symbol function, we found symbols for 332 active companies. We chose to exclude companies that were bought out, had a merger, or went bankrupt because we wanted to focus on how the companies were performing currently and finding historical stock data for companies that are no longer in existence or underwent some significant change proved to be very difficult.

We used these symbols to query Yahoo! Finance using a function called getKeyStats_xpath, which scrapes a company profile page for any relevant valuation statistics. This function returned many, many interesting features like Market Cap, Revenue, EBITDA, etc. We restricted our analyses to the following features because we find them to be more reliable and had a minimal number of NA values: Market Cap, Enterprise Value, Enterprise Value/Revenue, Enterprise Value/EBITDA, Revenue, Revenue Per Share, Gross Profit, EBITDA , Total Cash , Total Cash Per Share, 52-Week Change , Shares Outstanding, Float, %Held by Insiders , PEG Ratio.

Company Market Cap Enterprise Value Revenue EBITDA* Total Cash
Verizon 141 186 120 34 57
Google 354 297 57 18 55
Raytheon 27 29 24 3 4
Lockheed Martin 43 49 46 5 3
Texas Instru 46 48 12 4 4
Honeywell 68 72 38 6 6
Xerox 14 20 22 3 1
News Corp 10 7 9 1 3
General Elec 270 652 145 29 10
Quintiles 5 7 4 1 1

Table 3 Top 10 IPO Companies.

Breaking down the percentage change within the last 52 weeks by industry, we can get a rough picture of which industries are booming, busting, predictable, or volatile. The Figure 7 are the top industries identified earlier plus any industry with more than 15 companies represented in the data set. We can see that biotech, software, and cleantech have higher variances, making investing in those three industries high-risk, high-reward. Surprisingly, it appears that the safest industry for company growth currently is video-games, with an average share price increase of over fifty percent. The worst performing industries in terms of growth were advertising and e-commerce. Note that this graphic must be interpreted with skepticism, because a number of factors can affect share price growth. For example, the youngest companies tend to have much higher absolute values of percentage growth because they’re smaller, so industries with more young companies may have inflated figures. Furthermore, because this data excludes companies that failed post-IPO, had a merger, or were bought out, one cannot make inferences about these possibilities, which is likely what an investor looking at percentage growth may want to do.

Figure 7 the Histogram Shows the Success of Top 20 Industries.

Another interesting thing to look at was the comparison between Enterprise Value (EV) /Revenue (R) and Enterprise Value/EBITDA. These two ratios are popular for evaluating the value of the company. EV/R is a measure of the company’s ability to generate funds, while EV/EBITDA is a more direct representation of the company’s profits. We expected to see these numbers highly correlated, but found instead that they do not to a very high degree. A good example for how this can be is the medical industry; they have an average EV/R multiple of around 10, which seems high, however, the EV/EBITDA was around -2.7; this suggests that medical companies generate high revenue but do not enjoy very large profit margins. However, one must remember that there are many possible scenarios for any one ratio value, so not much weight can be given to these stats alone. Furthermore, it is generally considered meaningless to compare the investment multiples across industries, instead it is better to use them to compare similar companies. Finally, one must remember that the number of companies being compared per industry is very low, so a very large or a very small company could skew the industry average by a significant amount.

We also wanted to identify the“best” companies that are currently being publicly traded in our data set. To do this, we ranked every company in a number of key metrics, and then calculated an aggregate rank score from those rankings. We ranked the companies in the following metrics: Market Cap, Enterprise Value, Revenue, Revenue per share, Gross Profit, EBITDA, Total Cash, and PEG ratio.

This algorithm identified the top ten public companies in our data set as shown in Table 3. Not surprisingly, nearly all of these company names are well known, and they are all very large and stable companies. These companies would be a good “safe bet” investment. However, investment in these companies is not guaranteed to return at very high levels, because the algorithm’s calculations are based mostly on metrics that indicate a company’s total size, not its growth.

Conclusions

Startups are essential for ensuring a competitive marketplace, and they are the front lines of innovation in both well-established industries and emerging new markets. The USA’s ability to foster the growth of fledgling startups and support the smaller, exciting companies with promise is a significant strength of its economy. The explosion of the software industry and the subsequent related markets like web, mobile, social networking, etc. was made possible by the growing number of investors trying to get in on the next big company from an early phase and the ever-enlarging pool of investment dollars available for early startup funding.

A number of factors influence startup funding, and we found that startup funding fluctuates with the economy to a large degree. Furthermore, startups with a connection to San Francisco, New York, Los Angeles, Boston, or any other major investment center were much more likely to succeed. Indeed, we confirmed that most startup funding occurs around major metropolitan areas, as one might expect.

Certain industries have become much more prominent for startup investment over the years. For the last decade, software has been the undisputed giant, with more software companies being founded yearly than any other industry. Biotech also has seen steady growth, and more recently, web, mobile apps, and enterprise new companies have become more common. Some industries sectors like Software are certainly more “safer” than others like Biotech, Cleantech and therefore, the chance of success can indeed be greatly dependent upon industry.

Our model was solely based to give an insight to an investor and its future investments. We found that the most predictive features of a startup’s success are the amount of money received, region, industry, and year of founding. Using simply these features, one can predict with reasonable certainty which companies will go public or be acquired – one can assume investor confidence in those companies that receive more funding. However, no model is perfect, and predicting a company’s future based solely off of external factors like funding received, location, etc. cannot be all-encompassing since it fails to take into account the strength of a companies internal qualities like employee ability, quality of product or service, etc.

Still, investors do a remarkable job of not wasting their money. The top 20 investors had very good success rates across the board, but it was interesting to note that most of them preferred an acquisition to an IPO. Very few companies manage to reach the IPO stage, and to get there one must reject many opportunities to cash in, which is not always in an investor’s best interest. But, the low shut down rates definitely makes us skeptic about the survivorship bias.

Looking post-IPO, we tried to identify some factors that indicate a stable company, and look at which companies in our data set that began as startups managed to reach success. Big names like Verizon and Google dominate the list as one might expect, but perhaps a more interesting analysis was to look at industries and their growth; we found that the emerging industries with more startups were also those with much more unpredictable spread of a 52-week percent change in share price. Certain industries, like video gaming, appeared to offer more sure investment while other like advertisement seemed to only plummet post-IPO. This is interesting when analyzed in conjunction with the remarkably high rate of acquisition by advertising companies – perhaps advertising startups know that success in the market for their industry is less likely, so are more willing to cash out.

We believe we have painted an accurate and interesting picture of the startup landscape in the USA for the past 20 years. Startups drive the economy, and there seems to be an explosion in their prevalence after every new market emerges. The future holds promise and uncertainty for startups in the USA, but by analyzing their funding, we can make smart educated guesses on their future success and look to capitalize on smart investments.

About

This report is a joint project by Aayush Raman (Baylor College of Medicine and the University of Texas, MD Anderson Cancer Center), Yinsen Miao and Gabriel Rubio Breternitz (Department of Statistics, Rice University). Full pdf copy of the report and related R script can be acquired based on request. Email Me

This entry was posted in ggplot2, project. Bookmark the permalink.

Comments are closed.