Dark Data lives inside company systems. It's the data that is hard to find if you're not on the team that created it.

Rob Spiegel

July 12, 2016

6 Min Read
Crawling the Dark Places for Data

With a plethora of new data-gathering tools swarming the market, increasing amounts of data is being collected. Yet some of the most important data is going unused because it can’t be found by the right person at the critical time. With all the data coming over the transom, there is plenty of “Dark Data” that isn’t getting analyzed. Consequently, potentially useful trends are missed. Some companies are seeking advances in data collection and storage to help solve the problem. Even machine learning is coming into play so the data you need can find you.

Dark Data is quietly becoming an important issue in product design and production. According to data management experts, product companies need to spend more time analyzing data, instead of just collecting it in overwhelming volume. The dark unused data may hide critical secrets about products, from parts information to customer-use data. This is data they already have but don’t know it.

What Is Dark Data and Where Does it Hide?

Dark Data is a relatively new term. It comes off the heels of Big Data. Once you decide your data is useful, then you have to make sure you can find that data. “The term Dark Data has popped up a lot recently,” Greg Milliken, VP of marketing for M-Files, a data management company, told Design News. “In our view, Dark Data is data that can’t be located when it’s highly relevant and helpful.”

The challenge of Dark Data is that in the past it wasn’t viewed as valuable after it was created. Tons of product data lives in CAD files. That data is becoming useful to those who are preparing the product for manufacturing, but it’s often not available to the production team. “Companies are putting information in different systems or silos. There is all this structured data in customer relations management systems and in ERP systems,” said Milliken. “People look at different things differently depending on their different roles. The way one person stores the data is different than someone else.” He noted that this puts the onus on a company’s governance to teach staff how to store the data in retrievable form.

ATX Minn logoYour Data. Get It. Protect It. Practical information on embedding sensors in 3DP, automation & inventory control, big data as a diagnostic tool, cloud storage and security risks, and more in the Industry 4.0: Smart Strategies for Data Collection and Protection track at Automation Technology, Sept. 21-22, 2016 in Minneapolis. Register here for the event, hosted by Design News’ parent company UBM.

Product data usually stays in the silo where it was created, and typically, those in one silo can’t search the data in a different silo. “When the data is not within reach, it’s literally not visible. That can happen because of the way it’s been classified, or it can happen because the data is in a system that the person who needs it doesn’t use,” said Milliken. “If you can enable people to access the information in a fashion that makes it useful to them, you solve the problem.”

Organize it So it Can Be Found

One way to keep data within reach of users is to describe it in terms that become part of a company’s dictionary. “If we use the meta-data approach, we organize the data by what it is rather than where it’s stored. We want to find it by accurately describing it,” said Milliken. “Our system is wired to enable us to access the taxonomy by using standard keywords and tags. As people use the system, they’re curating it. They’re contributing to it and also deriving value from it.”

Making data visible also means you have to allow users to search across a variety of systems that may contain data from initial design or customer use. “You have to create an inviting and creative way for people to get data from multiple systems. You have to move away from silos to an environment where those silos get broken down. That’s somewhat in the future, but not far into the future,” said Milliken. “It’s about seeing the data as ‘what’ versus 'where.’ You don’t have to remember where it was created or placed. You find it all by context.”

Let the Data Find You

Just as Google and Amazon learn your preferences, data management systems can learn a company’s terminology preferences in order to help simplify cross-function searches. “One of the promising solutions is outside the users’ explicit tagging. It’s the notion of using machine-learning activities to evolve data management,” said Milliken. “If someone in sales is looking for certain information in a certain way, it teaches the organization to store the information in that certain way. It’s a matter of teaching the information to find the user.”

READ RELATED ARTICLES ON DESIGN NEWS:

The machine-learning tools can guide users to create tags that make the data easier to reach no matter who seeks it. “You can use text analytics to teach users to automatically classify information. Those tools improve with use. As people add their terms, the system can automatically classify the data, even if it’s not perfect,” said Milliken. “Even if we just automate most things, machine learning can access user behavior and infer things about information. When we tie it together with meta-data, we think we’re solving the problem.”

Ultimately, a smart system should be able to find the data no matter who seeks it, whether it’s someone from the design team or someone in sales. “It’s about creating a more unified environment where information can be accessed in common ways. There is no monolithic way to get things done, but making it easy to move from one information system to another helps,” said Milliken. “The PLM vendors are trying to solve the case, but often the PLM doesn’t reach far beyond the engineering side.”

He said he sees a solution that works for anyone who needs the data, not matter how they’re accustomed to searching. “The way I want to consume the data may be different than the way others want to consume it, but you need to reach it one way or the other,” said Milliken. “So you have to create an intelligent layer so you can reach information virtually.

[image via M-Files]

Rob Spiegel has covered automation and control for 15 years, 12 of them for Design News. Other topics he has covered include supply chain technology, alternative energy, and cyber security. For 10 years he was owner and publisher of the food magazine Chile Pepper.

About the Author(s)

Rob Spiegel

Rob Spiegel serves as a senior editor for Design News. He started with Design News in 2002 as a freelancer and hired on full-time in 2011. He covers automation, manufacturing, 3D printing, robotics, AI, and more.

Prior to Design News, he worked as a senior editor for Electronic News and Ecommerce Business. He has contributed to a wide range of industrial technology publications, including Automation World, Supply Chain Management Review, and Logistics Management. He is the author of six books.

Before covering technology, Rob spent 10 years as publisher and owner of Chile Pepper Magazine, a national consumer food publication.

As well as writing for Design News, Rob also participates in IME shows, webinars, and ebooks.

Sign up for the Design News Daily newsletter.

You May Also Like