Data Collection and sharing
 


Data Sharing

Long before cloud computing, companies were sharing vital information with customers, partners, vendors, and contractors to make business processes run more efficiently and economically. They started with Web commerce, then moved into mobile applications and social networking. Each new information-sharing program opened up another hole in the corporate information security armor. While traditional security was focused on keeping people out of the data center, new security processes needed to be implemented to ensure that only the appropriate people were getting in. Cloud computing is another step on the continuum, and it also raises the stakes. Hosting vital data and applications on a cloud provider’s infrastructure puts vital information outside the corporate wall. Even more importantly, it creates a new set of users who have full access privileges to your data and applications — namely the cloud service administrators.

Between those two factors, organizations are ceding a risky amount of control over vital information.

Too often, without realizing it, they rely on nothing more than trust to keep their data safe. They trust that the right people have the right access to vital information and will use it for the right things, yet they don’t really know who they’re trusting because they don’t know who all of those users are. Their service provider tells them to trust that they are managing user access effectively. Trust, in this context, is a flimsy defense.

Regaining control over vital data means focusing on an often-overlooked aspect of data access – identity and access management. IAM encompasses the business processes and technology automation systems used to provision access, calculate risks to information resources, and eliminate those risks quickly and efficiently. It approaches data security from the perspective of ensuring appropriate user access policy is set; understanding and identifying who your users are; and granting them access appropriate to their roles in the organization.

Policies and roles are central to an effective IAM program. Roles, defined by business managers, enable organizations to classify users in groups and assign them appropriate access privileges based on what they need to do their jobs. A role for front-line retail employees, for example, might include an e-mail account, access to their wage and benefit information, POS systems, file servers hosting relevant documents and remote access privileges. Their manager’s role would include all of those privileges, plus access to time and accounting systems, employee evaluation files, customer and inventory databases and more. The company might create a role for suppliers to allow them to see inventory information and the ordering system. A role for partners might give them access to the project management system to check on the status of joint ventures. Classifying users according to their roles takes the unknown out of user access equation. The organization “knows” its users according to their roles.

Technology can be a significant asset to help organizations with this challenging process. IAM software systems automate key functions such as role definition and management, provisioning access privileges, access verification and certification, and password management. Automation makes IAM fast, agile and scalable, compared to ponderous and expensive manual systems. Combined with data analytics applications, they can identify associations and patterns that might violate compliance guidelines and company policies, or indicate hidden risks. Organizations that build this kind of infrastructure can implement any kind of information sharing program without creating unreasonable risk to their vital information assets.

Automation effectively helps organizations through all the phases of the identity and access lifecycle:
defining policies for who should have access to what;
automatically enforcing this policy for all joiners, movers and leavers to the organization;
verifying that the access is appropriate, and
providing ongoing monitoring and manage access risk on a continuous, near real-time basis.

Let’s circle back to cloud computing. It carries significant risk due to the location of potentially sensitive information and the need to share that information with a wide variety of individuals. The fundamental technology of cloud computing is no less secure than conventional networking technologies; often it may be more secure. The concern stems from giving up control of key assets and data without effectively managing the risk of doing so. Knowing that a new class of users, cloud provider administrators, will have high level access to that sensitive information, this must be effectively managed. Giving up the infrastructure and storage of the data does not mean an organization gives up responsibility for managing user access to it. Organizations must work with their service provider to ensure tight policies on which individuals have access to that data for administrative purposes. They need to understand where their data is located; have ongoing controls to provision new users, de-provision users no longer requiring access; and know at all times who has access to the data and what they are doing with it. Controlling who can get to it is the first and most important part of due diligence.

Ensuring that information can quickly and easily get to employees, partners, and customers is becoming a critical business requirement. But opening information systems without the necessary controls opens up the organization to significant vulnerability. Customers, partners, vendors, employees, and contractors must be able to access your information from anywhere and through any means, from a cloud infrastructure to hand-held device. This free flow of information helps business run faster, smoother, and at lower cost. The advance of technologies like IAM will help ensure that it also flows securely.

 


Data Collection

Data collection is any process of preparing and collecting data, for example, as part of a process improvement or similar project. The purpose of data collection is to obtain information to keep on record, to make decisions about important issues, or to pass information on to others. Data are primarily collected to provide information regarding a specific topic.

Data collection usually takes place early on in an improvement project, and is often formalised through a data collection plan which often contains the following activity.

  • Pre collection activity — agree on goals, target data, definitions, methods
  • Collection — data collection
  • Present Findings — usually involves some form of sorting[3] analysis and/or presentation.

Data collection by cloud companies
Social media data is one of the easiest and cheaper mode of understanding the personality and preferences of an individual, which makes online data collection a major trend in the marketing industry.
What you don’t know here is that apart from advertising their brands and offers through social media websites, these companies also keep a track of things you post on your social networking site.

It is fun to post things on your friend’s forum or to create your own blog, but with all these posts and writings being public, you must not overlook the fact that everything you write online is scrutinized and used by companies looking forward to get in touch with you for various reasons.

Experts have all kinds of theories regarding data mining, some believe that using casual chats and posts of users by their banks or other companies can provoke a backlash, other insists that it is nothing but harmless way of making things easy and customized for both the prospective users and the interested companies. Its like when a user leaves a recommendation posts on an online bookstore about a particular book, the online bookstores find out that a particular user likes to buy particular set of books through online bookstores.

How data collection is done?
It is amazing to see how a data collecting company can create a dossier about a person without even meeting him/her in person. A data collecting company basically collects information including the financial picture and purchase preferences or more.
If you are wondering how online data collecting works and why does it sound scary? Then, you must realize that a data mining company does nothing to affect you, or any other user, personally or professionally. Using basic online post, casual forum chats and blog based information, these companies create a behavioral pattern of an individual to form an information chart of possible preferences and status of each individual. This collection of online information and behavioral pattern is based on publicly posted data and thus, it does not impact a person in any negative way.

Usually when a person hears about online data collecting his first reaction is worry about his credit reports and information, which is where online data collating faces opposition by users. But, not many people know that social media data collecting is purely based on the information posted on blogs, social media websites like Twitter, Facebook, newsgroups and forums, which also mean that these data collecting companies do not hold any access to an individual’s credit reports.

You must also take a note of the fact that social media data collecting is nothing but a marketing tool used by financial organizations like lenders and banks and other non-profit companies to help their marketing goals and lending decisions. It’s like when you write a post about a particular credit card that you want to know about or have heard about or dislike, that particular data is used by a data company to give their banker clients your perception and preferences about that product.

Any company that mines social data has access to only the information that you have made public on internet, as these companies work like search engine crawlers, thus everything you publish online is crawled instantly. Often, data collecting companies work for clients looking forward to ability to personalize and customize their offers services to better use.

While it acts a benefit for companies looking forward to target prospective customers, it can also help you with your credits. When you are looking for a loan or some kind of financial assistance, saying or writing the right things online are very important. Staying in the friend’s list of people with good credit histories makes you a person with good or trusted company. This is what some expert’s call as “like follows like” concept.


Here are some of the stats of Facebook :

  • 2.5 billion content items shared per day (status updates + wall posts + photos + videos + comments)
  • 2.7 billion Likes per day
  • 300 million photos uploaded per day
  • 100+ petabytes of disk space in one of FB’s largest Hadoop (HDFS) clusters
  • 105 terabytes of data scanned via Hive, Facebook’s Hadoop query language, every 30 minutes
  • 70,000 queries executed on these databases per day
  • 500+terabytes of new data ingested into the databases every day