Thursday, May 15, 2014

Hadoop Hive External vs Internal Table

Hive tables can be created as EXTERNAL or INTERNAL. This is a choice that affects how data is loaded, controlled, and managed.

Use EXTERNAL tables when:

  • The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn't lock the files.
  • Data needs to remain in the underlying location even after a DROP TABLE. This can apply if you are pointing multiple schemas (tables or views) at a single data set or if you are iterating through various possible schemas.
  • You want to use a custom location such as ASV.
  • Hive should not own data and control settings, dirs, etc., you have another program or process that will do those things.
  • You are not creating table based on existing table (AS SELECT).

Use INTERNAL tables when:

  • The data is temporary.
  • You want Hive to completely manage the lifecycle of the table and data.

Wednesday, May 14, 2014

Hadoop Quick Info

Apache Hadoop– Hadoop is an open source software framework which allows you to cheaply store and process vast amounts of structured and unstructured data.

Flume– A service for collecting, aggregating, and moving large amounts of log and event data into Hadoop.

HBase- A scalable, distributed, column-oriented data store that runs on top of HDFS. A short video overview of Flume.

HDFS– an acronym for "Hadoop Distributed File System"

Hive- A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. It allows you to query data using a SQL-like language called HiveQL (HQL).

HiveQL (HQL)- A SQL like query language for Hadoop used to execute MapReduce jobs on HDFS.

JobTracker– the service within Hadoop which distributes MapReduce tasks to specific nodes in the cluster.

NameNode– the core of the HDFS file system. The NameNode maintains a record of all files stored on the Hadoop cluster.

Oozie - workflow scheduler system to manage Apache Hadoop jobs.

Pig– a high level programming language for creating MapReduce programs used within Hadoop. An introduction to Pig.

Sqoop– a tool for transferring data between Hadoop and relational databases.

YARN– a resource manager for Hadoop 2. YARN is short for "Yet another resource negotiator". Introduction to YARN on the Apache Hadoop website.

ZooKeeper - Centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

Thursday, April 3, 2014

Visual Studio and Browser Link Feature

Announcement made today in Build 2014.

You can EDIT the Styles/Javascripts loaded in browser using the Browser's developer Tools (which we launch by pressing F12).

As you EDIT, you can see the changes made inside the Visual Studio code base and also other browsers (say chrome) getting auto-refreshed with the changes made in another browser Developer Tools.

Azure Web Jobs

Announcement made today in Build 2014.

Now we can run Web Jobs as part of Windows Azure Websites and run as background jobs (calling queues, etc...)

Need not spin up a new VM or new cloud services to run the background jobs, instead create Web Jobs as apart of Windows Azure Websites.

Wednesday, April 2, 2014

Rethink - 2 in 1 User Experience

I liked this video explaining about “Rethink Navigation for 2 in 1s

- Explaining Easy, OK, Hard to hit areas on Mobile, Tablets & Touch screen laptops.

Research Analysis

Positioning of the main Navigational elements of your App on these device variations

Tuesday, February 18, 2014

Commerce Server 10.1 Setup on Windows Server 2012

In this blog post i am documenting the steps required to install and configure Commerce Server 10.1 on Windows Server 2012 machine.

List of downloads i did from

Launch CommerceServer- setup file (note: version may vary in future).

Splash screen of Commerce Server 10.1 installation

Accept the Licence Agreement page

Installation in progress

Installation in progress

Commerce Server Configuration Wizard. Click Next

MSCS_Admin DB configuration

Add a new user "CS_RunTime"

Configure Staging Service to run with CS_RunTime user

Commerce Server Configuration summary

Commerce Server Configuration - In Progress

Commerce Server Configuration - Completed

Commerce Server Setup - Completed

Post installation of Commerce Server 10.1 on Windows Server 2012 we can now see in the Metro UI following apps are added

Wednesday, February 5, 2014

Docker Overview

In this post, I will document the learning I had with Docker.

Container History

- Linux Containers

  • Open VZ 
    • Single patched linux kernal ability to use architecture kernel version of system for executing the container
  • Linux V-Server
    • Virtual Private Server implementation. Created for adding OS level virtualization to the linux kernel itself.
What's a Container?
  • OS level Virtualization.
  • Kernel of OS allowing multiple isolated user space instances instead of one. 
  • Like real server from point of view of operator.
What is Docker?
Docker is an open-source project to easily create lightweight, portable, self-sufficient containers from any application. The same container that a developer builds and tests on a laptop can run at scale, in production, on VMs, bare metal, OpenStack clusters, public clouds and more.

Technology behind Docker 

  • Developed using Google's Go language
  • with Linux kernel using cgroup & cnames (namespace)
  • AUFS union filesystem
  • LxC (Linux containers)

TB typed i learn...

Sunday, January 19, 2014

Android Layouts

In this blog post, i am documenting at high-level the leanings i had on the Android - Layout Types
  • Linear Layout (Aligns its child elements in single direction one after another)
    • Horizontal Layout
    • Vertical Layout
  • Relative Layout
    • Align's child elements relative to its sibling or parent. Help to avoid nested linear layouts.
  • Frame Layout (Allows child views to put on top of one another child views)
  • Table Layout (Aligns elements in Row & Column format)

More TB typed...

Friday, January 3, 2014

Powershell commands

This post i am documenting some of powershell quick reference scripts

Get Members of a command
c:> get-service | gm

Get-help CMD-NAME
c:> get-help start-service

Get-process -name "Name"
c:> get-process -name "outlook"

Get Process - physical directory path
PS C:\Windows\system32> get-process outlook | dir

Directory: C:\Program Files (x86)\Microsoft Office\Office14

Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---         3/31/2011   4:08 PM   15933792 OUTLOOK.EXE

c:> get-service -filter * | select -Property name, @{name='ComputerName',expression={$}_

Backbone.js overview

Backbone.js is a library of tools that helps to build client side rich web applications.
- Models with Key-Value binidng
- Custom Events
- Collections with a rich API of enumerable functions
- Views with declarative event handling
- connects it all to your existing API over a RESTful JSON interface

Client-side backbone.js application
- Router(s)
- Views
- Models
- Collections

JSON data

- RESTful endpoint

- Fast
- Highly interactive

- Cannot be indexed by search engines (without extra work)
- Difficult to test (as client side application)
- Security Issues

SPA - Single Page Applications
- Improved User Experience
SPA on Client Side
- User interface
- Logic
- Page Generation

SPA challenges
- Lack of tooling and experience
- Working with different browsers

Routers helps simulate Page changes, support page history and bookmarking.

More ToBe typed....

JavaScript Frameworks

JavaScript based UI Frameworks
JavaScript Libraries that help in implementing Architectural Patterns