Thursday, 2 May 2019

Top 10 Kubernetes interview questions




Q1. How is Kubernetes different from Docker Swarm?
Features Kubernetes Docker Swarm Installation & Cluster Config
Setup is very complicated, but once installed cluster is robust.
Installation is very simple, but the cluster is not robust. GUI
GUI is the Kubernetes Dashboard.
There is no GUI. Scalability
Highly scalable and scales fast.
Highly scalable and scales 5x faster than Kubernetes. Auto-scaling
Kubernetes can do auto-scaling.
Docker swarm cannot do auto-scaling. Load Balancing
Manual intervention needed for load balancing traffic between different containers and pods.
Docker swarm does auto load balancing
of traffic between containers in the cluster. Rolling Updates & Rollbacks
Can deploy rolling updates and does automatic rollbacks.
Can deploy rolling updates, but not automatic rollback. DATA Volumes
Can share storage volumes only with the other containers in the same pod.
Can share storage volumes with any other container. Logging & Monitoring
In-built tools for logging and monitoring.
3rd party tools like ELK stack should be used
for logging and monitoring.
Q2. What is Kubernetes?
Kubernetes is an open-source container management tool which holds the responsibilities of container deployment, scaling & descaling of containers & load balancing. Being the Google’s brainchild, it offers excellent community and works brilliantly with all the cloud providers. So, we can say that Kubernetes is not a containerization platform, but it is a multi-container management solution.
Q3. How is Kubernetes related to Docker?
It’s a known fact that Docker provides the lifecycle management of containers and a Docker image builds the runtime containers. But, since these individual containers have to communicate, Kubernetes is used. So, Docker builds the containers and these containers communicate with each other via Kubernetes. So, containers running on multiple hosts can be manually linked and orchestrated using Kubernetes.
Q4. What is the difference between deploying applications on hosts and containers?
➢ Refer to the above diagram. The left side architecture represents deploying applications on hosts. So, this kind of architecture will have an operating system and then the operating system will have a kernel which will have various libraries installed on the operating system needed for the application. So, in this kind of framework you can have n number of applications and all the applications will share the libraries present in that operating system whereas while deploying applications in containers the architecture is a little different.
➢ This kind of architecture will have a kernel and that is the only thing that’s going to be the only thing common between all the applications. So, if there’s a particular application which needs Java then that particular application we’ll get access to Java and if there’s another application which needs Python then only that particular application will have access to Python.
➢ The individual blocks that you can see on the right side of the diagram are basically containerized and these are isolated from other applications. So, the applications have the necessary libraries and binaries isolated from the rest of the system, and cannot be encroached by any other application.
Q5. What is Container Orchestration?
Consider a scenario where you have 5-6 microservices for an application. Now, these microservices are put in individual containers, but won’t be able to communicate without container orchestration. So, as orchestration means the amalgamation of all instruments playing together in harmony in music, similarly container orchestration means all the services in individual containers working together to fulfill the needs of a single server.
Q6. What is the need for Container Orchestration?
Consider you have 5-6 microservices for a single application performing various tasks, and all these microservices are put inside containers. Now, to make sure that these containers communicate with each other we need container orchestration.
As you can see in the above diagram, there were also many challenges that came into place without the use of container orchestration. So, to overcome these challenges the container orchestration came into place.
Q7. What are the features of Kubernetes?
The features of Kubernetes, are as follows:
Q8. How does Kubernetes simplify containerized Deployment?
As a typical application would have a cluster of containers running across multiple hosts, all these containers would need to talk to each other. So, to do this you need something big that would load balance, scale & monitor the containers. Since Kubernetes is cloud-agnostic and can run on any public/private providers it must be your choice simplify containerized deployment.
Q9. What do you know about clusters in Kubernetes?
The fundamental behind Kubernetes is that we can enforce the desired state management, by which I mean that we can feed the cluster services of a specific configuration, and it will be up to the cluster services to go out and run that configuration in the infrastructure.
So, as you can see in the above diagram, the deployment file will have all the configurations required to be fed into the cluster services. Now, the deployment file will be fed to the API and then it will be up to the cluster services to figure out how to schedule these pods in the environment and make sure that the right number of pods are running.
So, the API which sits in front of services, the worker nodes & the Kubelet process that the nodes run, all together make up the Kubernetes Cluster.
Q10. What is Google Container Engine?
Google Container Engine (GKE) is an open source management platform for Docker containers and the clusters. This Kubernetes based engine supports only those clusters which run within the Google’s public cloud services.
Q11. What is Heapster?
Heapster is a cluster-wide aggregator of data provided by Kubelet running on each node. This container management tool is supported natively on Kubernetes cluster and runs as a pod, just like any other pod in the cluster. So, it basically discovers all nodes in the cluster and queries usage information from the Kubernetes nodes in the cluster, via on-machine Kubernetes agent.
Q12. What is Minikube?
Minikube is a tool that makes it easy to run Kubernetes locally. This runs a single-node Kubernetes cluster inside a virtual machine.
Q13. What is Kubectl?
Kubectl is the platform using which you can pass commands to the cluster. So, it basically provides the CLI to run commands against the Kubernetes cluster with various ways to create and manage the Kubernetes component.
Q14. What is Kubelet?
This is an agent service which runs on each node and enables the slave to communicate with the master. So, Kubelet works on the description of containers provided to it in the PodSpec and makes sure that the containers described in the PodSpec are healthy and running.
Q15. What do you understand by a node in Kubernetes?
Q1. What are the different components of Kubernetes Architecture?
The Kubernetes Architecture has mainly 2 components – the master node and the worker node. As you can see in the below diagram, the master and the worker nodes have many inbuilt components within them. The master node has the kube-controller-manager, kube-apiserver, kube-scheduler, etcd. Whereas the worker node has kubelet and kube-proxy running on each node.
Q2. What do you understand by Kube-proxy?
Kube-proxy can run on each and every node and can do simple TCP/UDP packet forwarding across backend network service. So basically, it is a network proxy which reflects the services as configured in Kubernetes API on each node. So, the Docker-linkable compatible environment variables provide the cluster IPs and ports which are opened by proxy.
Q3. Can you brief on the working of the master node in Kubernetes?
Kubernetes master controls the nodes and inside the nodes the containers are present. Now, these individual containers are contained inside pods and inside each pod, you can have a various number of containers based upon the configuration and requirements. So, if the pods have to be deployed, then they can either be deployed using user interface or command line interface. Then, these pods are scheduled on the nodes and based on the resource requirements, the pods are allocated to these nodes. The kube-apiserver makes sure that there is communication established between the Kubernetes node and the master components.
Q4. What is the role of kube-apiserver and kube-scheduler?
The kube – apiserver follows the scale-out architecture and, is the front-end of the master node control panel. This exposes all the APIs of the Kubernetes Master node components and is responsible for establishing communication between Kubernetes Node and the Kubernetes master components.
The kube-scheduler is responsible for distribution and management of workload on the worker nodes. So, it selects the most suitable node to run the unscheduled pod based on resource requirement and keeps a track of resource utilization. It makes sure that the workload is not scheduled on nodes which are already full.
Q5. Can you brief about the Kubernetes controller manager?
Multiple controller processes run on the master node but are compiled together to run as a single process which is the Kubernetes Controller Manager. So, Controller Manager is a daemon that embeds controllers and does namespace creation and garbage collection. It owns the responsibility and communicates with the API server to manage the end-points.
So, the different types of controller manager running on the master node are :
Q6. What is ETCD?
Etcd is written in Go programming language and is a distributed key-value store used for coordinating between distributed work. So, Etcd stores the configuration data of the Kubernetes cluster, representing the state of the cluster at any given point in time.
Q7. What are the different types of services in Kubernetes?
The following are the different types of services used:
Q8. What do you understand by load balancer in Kubernetes?
A load balancer is one of the most common and standard ways of exposing service. There are two types of load balancer used based on the working environment i.e. either the Internal Load Balancer or the External Load Balancer. The Internal Load Balancer automatically balances load and allocates the pods with the required configuration whereas the External Load Balancer directs the traffic from the external load to the backend pods.
Q9. What is Ingress network, and how does it work?
Ingress network is a collection of rules that acts as an entry point to the Kubernetes cluster. This allows inbound connections, which can be configured to give services externally through reachable URLs, load balance traffic, or by offering name-based virtual hosting. So, Ingress is an
API object that manages external access to the services in a cluster, usually by HTTP and is the most powerful way of exposing service.
Now, let me explain to you the working of Ingress network with an example.
There are 2 nodes having the pod and root network namespaces with a Linux bridge. In addition to this, there is also a new virtual ethernet device called flannel0(network plugin) added to the root network.
Now, suppose we want the packet to flow from pod1 to pod 4. Refer to the below diagram.
• So, the packet leaves pod1’s network at eth0 and enters the root network at veth0.
• Then it is passed on to cbr0, which makes the ARP request to find the destination and it is found out that nobody on this node has the destination IP address.
• So, the bridge sends the packet to flannel0 as the node’s route table is configured with flannel0.
• Now, the flannel daemon talks to the API server of Kubernetes to know all the pod IPs and their respective nodes to create mappings for pods IPs to node IPs.
• The network plugin wraps this packet in a UDP packet with extra headers changing the source and destination IP’s to their respective nodes and sends this packet out via eth0.
• Now, since the route table already knows how to route traffic between nodes, it sends the packet to the destination node2.
• The packet arrives at eth0 of node2 and goes back to flannel0 to de-capsulate and emits it back in the root network namespace.
• Again, the packet is forwarded to the Linux bridge to make an ARP request to find out the IP that belongs to veth1.
• The packet finally crosses the root network and reaches the destination Pod4.
Q10. What do you understand by Cloud controller manager?
The Cloud Controller Manager is responsible for persistent storage, network routing, abstracting the cloud-specific code from the core Kubernetes specific code, and managing the communication with the underlying cloud services. It might be split out into several different containers depending on which cloud platform you are running on and then it enables the cloud vendors and Kubernetes code to be developed without any inter-dependency. So, the cloud vendor develops their code and connects with the Kubernetes cloud-controller-manager while running the Kubernetes.
The various types of cloud controller manager are as follows:
Q11. What is Container resource monitoring?
As for users, it is really important to understand the performance of the application and resource utilization at all the different abstraction layer, Kubernetes factored the management of the cluster by creating abstraction at different levels like container, pods, services and whole cluster. Now, each level can be monitored and this is nothing but Container resource monitoring.
The various container resource monitoring tools are as follows:
Q12. What is the difference between a replica set and replication controller?
Replica Set and Replication Controller do almost the same thing. Both of them ensure that a specified number of pod replicas are running at any given time. The difference comes with the usage of selectors to replicate pods. Replica Set use Set-Based selectors while replication controllers use Equity-Based selectors.
• Equity-Based Selectors: This type of selector allows filtering by label key and values. So, in layman terms, the equity-based selector will only look for the pods which will have the exact same phrase as that of the label. Example: Suppose your label key says app=nginx, then, with this selector, you can only look for those pods with label app equal to nginx.
• Selector-Based Selectors: This type of selector allows filtering keys according to a set of values. So, in other words, the selector based selector will look for pods whose label has been mentioned in the set. Example: Say your label key says app in (nginx, NPS, Apache). Then, with this selector, if your app is equal to any of nginx, NPS, or Apache, then the selector will take it as a true result.
Q13. What is a Headless Service?
Headless Service is similar to that of a ‘Normal’ services but does not have a Cluster IP. This service enables you to directly reach the pods without the need of accessing it through a proxy.
Q14. What are the best security measures that you can take while using Kubernetes?
The following are the best security measures that you can follow while using Kubernetes:
Q15. What are federated clusters?
Multiple Kubernetes clusters can be managed as a single cluster with the help of federated clusters. So, you can create multiple Kubernetes clusters within a data center/cloud and use federation to control/manage them all at one place.
The federated clusters can achieve this by doing the following two things. Refer to the below diagram.
cenario 1: Suppose a company built on monolithic architecture handles numerous products. Now, as the company expands in today’s scaling industry, their monolithic architecture started causing problems.
How do you think the company shifted from monolithic to microservices and deploy their services containers?
Solution:
As the company’s goal is to shift from their monolithic application to microservices, they can end up building piece by piece, in parallel and just switch configurations in the background. Then they can put each of these built-in microservices on the Kubernetes platform. So, they can start by migrating their services once or twice and monitor them to make sure everything is running stable. Once they feel everything is going good, then they can migrate the rest of the application into their Kubernetes cluster.
Scenario 2: Consider a multinational company with a very much distributed system, with a large number of data centers, virtual machines, and many employees working on various tasks.
How do you think can such a company manage all the tasks in a consistent way with Kubernetes?
Solution:
As all of us know that I.T. departments launch thousands of containers, with tasks running across a numerous number of nodes across the world in a distributed system.
In such a situation the company can use something that offers them agility, scale-out capability, and DevOps practice to the cloud-based applications.
So, the company can, therefore, use Kubernetes to customize their scheduling architecture and support multiple container formats. This makes it possible for the affinity between container tasks that gives greater efficiency with an extensive support for various container networking solutions and container storage.
Scenario 3: Consider a situation, where a company wants to increase its efficiency and the speed of its technical operations by maintaining minimal costs.
How do you think the company will try to achieve this?
Solution:
The company can implement the DevOps methodology, by building a CI/CD pipeline, but one problem that may occur here is the configurations may take time to go up and running. So, after implementing the CI/CD pipeline the company’s next step should be to work in the cloud environment. Once they start working on the cloud environment, they can schedule containers on a cluster and can orchestrate with the help of Kubernetes. This kind of approach will help the company reduce their deployment time, and also get faster across various environments.
Scenario 4: Suppose a company wants to revise it’s deployment methods and wants to build a platform which is much more scalable and responsive.
How do you think this company can achieve this to satisfy their customers?
Solution:
In order to give millions of clients the digital experience they would expect, the company needs a platform that is scalable, and responsive, so that they could quickly get data to the client website. Now, to do this the company should move from their private data centers (if they are using any) to any cloud environment such as AWS. Not only this, but they should also implement the microservice architecture so that they can start using Docker containers. Once they have the base framework ready, then they can start using the best orchestration platform
available i.e. Kubernetes. This would enable the teams to be autonomous in building applications and delivering them very quickly.
Scenario 5: Consider a multinational company with a very much distributed system, looking forward to solving the monolithic code base problem.
How do you think the company can solve their problem?
Solution
Well, to solve the problem, they can shift their monolithic code base to a microservice design and then each and every microservices can be considered as a container. So, all these containers can be deployed and orchestrated with the help of Kubernetes.
Scenario 6: All of us know that the shift from monolithic to microservices solves the problem from the development side, but increases the problem at the deployment side.
How can the company solve the problem on the deployment side?
Solution
The team can experiment with container orchestration platforms, such as Kubernetes and run it in data centers. So, with this, the company can generate a templated application, deploy it within five minutes, and have actual instances containerized in the staging environment at that point. This kind of Kubernetes project will have dozens of microservices running in parallel to improve the production rate as even if a node goes down, then it can be rescheduled immediately without performance impact.
Scenario 7: Suppose a company wants to optimize the distribution of its workloads, by adopting new technologies.
How can the company achieve this distribution of resources efficiently?
Solution
The solution to this problem is none other than Kubernetes. Kubernetes makes sure that the resources are optimized efficiently, and only those resources are used which are needed by that particular application. So, with the usage of the best container orchestration tool, the company can achieve the distribution of resources efficiently.
Scenario 8: Consider a carpooling company wants to increase their number of servers by simultaneously scaling their platform.
How do you think will the company deal with the servers and their installation?
Solution
The company can adopt the concept of containerization. Once they deploy all their application into containers, they can use Kubernetes for orchestration and use container monitoring tools like Prometheus to monitor the actions in containers. So, with such usage of containers, giving them better capacity planning in the data center because they will now have fewer constraints due to this abstraction between the services and the hardware they run on.
Scenario 9: Consider a scenario where a company wants to provide all the required hand-outs to its customers having various environments.
How do you think they can achieve this critical target in a dynamic manner?
Solution
The company can use Docker environments, to put together a cross-sectional team to build a web application using Kubernetes. This kind of framework will help the company achieve the goal of getting the required things into production within the shortest time frame. So, with such a machine running, the company can give the hands-outs to all the customers having various environments.
Scenario 10: Suppose a company wants to run various workloads on different cloud infrastructure from bare metal to a public cloud.
How will the company achieve this in the presence of different interfaces?
Solution
The company can decompose its infrastructure into microservices and then adopt Kubernetes. This will let the company run various workloads on different cloud infrastructures.
Q1. What are minions in Kubernetes cluster?
a. They are components of the master node.
b. They are the work-horse / worker node of the cluster.[Ans]
c. They are monitoring engine used widely in kubernetes.
d. They are docker container service.
Q2. Kubernetes cluster data is stored in which of the following?
a. Kube-apiserver
b. Kubelet
c. Etcd[Ans]
d. None of the above
Q3. Which of them is a Kubernetes Controller?
a. ReplicaSet
b. Deployment
c. Rolling Updates
d. Both ReplicaSet and Deployment[Ans]
Q4. Which of the following are core Kubernetes objects?
a. Pods
b. Services
c. Volumes
d. All of the above[Ans]
Q5. The Kubernetes Network proxy runs on which node?
a. Master Node
b. Worker Node
c. All the nodes[Ans]
d. None of the above
Q6. What are the responsibilities of a node controller?
a. To assign a CIDR block to the nodes
b. To maintain the list of nodes
c. To monitor the health of the nodes
d. All of the above[Ans]
Q7. What are the responsibilities of Replication Controller?
a. Update or delete multiple pods with a single command
b. Helps to achieve the desired state
c. Creates a new pod, if the existing pod crashes
d. All of the above[Ans]
Q8. How to define a service without a selector?
a. Specify the external name[Ans]
b. Specify an endpoint with IP Address and port
c. Just by specifying the IP address
d. Specifying the label and api-version
Q9. What did the 1.8 version of Kubernetes introduce?
a. Taints and Tolerations[Ans]
b. Cluster level Logging
c. Secrets
d. Federated Clusters
Q10. The handler invoked by Kubelet to check if a container’s IP address is open or not is?
a. HTTPGetAction
b. ExecAction
c. TCPSocketAction[Ans]
d. None of the above

Top 50 AWS interview Questions




Section 1: What is Cloud Computing. Can you talk about and compare any two popular Cloud Service Providers?
Amazon Web Services Vs Microsoft Azure
Parameters AWS Azure
Initiation
2006
2010
Market Share
4x
x
Implementation
Less Options
More Experimentation Possible
Features
Widest Range Of Options
Good Range Of Options
App Hosting
AWS not as good as Azure
Azure Is Better
Development
Varied & Great Features
Varied & Great Features
IaaS Offerings
Good Market Hold
Better Offerings than AWS
1. Try this AWS scenario based interview question. I have some private servers on my premises, also I have distributed some of my workload on the public cloud, what is this architecture called?
A. Virtual Private Network
B. Private Cloud
C. Virtual Private Cloud
D. Hybrid Cloud
Answer D.
Explanation: This type of architecture would be a hybrid cloud. Why? Because we are using both, the public cloud, and your on premises servers i.e the private cloud. To make this hybrid architecture easy to use, wouldn’t it be better if your private and public cloud were all on the same network(virtually). This is established by including your public cloud servers in a virtual private cloud, and connecting this virtual cloud with your on premise servers using a VPN(Virtual Private Network).
Section 2: Amazon EC2 Interview Questions
2. What does the following command do with respect to the Amazon EC2 security groups?
ec2-create-group CreateSecurityGroup
A. Groups the user created security groups into a new group for easy access.
B. Creates a new security group for use with your account.
C. Creates a new group inside the security group.
D. Creates a new rule inside the security group.
➢ Answer B.
➢ Explanation: A Security group is just like a firewall, it controls the traffic in and out of your instance. In AWS terms, the inbound and outbound traffic. The command mentioned is pretty straight forward, it says create security group, and does the same. Moving along, once your security group is created, you can add different rules in it. For example, you have an RDS instance, to access it, you have to add the public IP address of the machine from which you want access the instance in its security group.
➢ 3. Here is aws scenario based interview question. You have a video trans-coding application. The videos are processed according to a queue. If the processing of a video is interrupted in one instance, it is resumed in another instance. Currently there is a huge back-log of videos which needs to be processed, for this you need to add more instances, but you need these instances only until your backlog is reduced. Which of these would be an efficient way to do it?
➢ You should be using an On Demand instance for the same. Why? First of all, the workload has to be processed now, meaning it is urgent, secondly you don’t need them once your backlog is cleared, therefore Reserved Instance is out of the picture, and since the work is urgent, you cannot stop the work on your instance just because the spot price spiked, therefore Spot Instances shall also not be used. Hence On-Demand instances shall be the right choice in this case.
➢ 4. You have a distributed application that periodically processes large volumes of data across multiple Amazon EC2 Instances. The application is designed to recover gracefully from Amazon EC2 instance failures. You are required to accomplish this task in the most cost effective way.
Which of the following will meet your requirements?
A. Spot Instances
B. Reserved instances
C. Dedicated instances
D. On-Demand instances
Answer: A
Explanation: Since the work we are addressing here is not continuous, a reserved instance shall be idle at times, same goes with On Demand instances. Also it does not make sense to launch an On Demand instance whenever work comes up, since it is expensive. Hence Spot Instances will be the right fit because of their low rates and no long term commitments.
5. How is stopping and terminating an instance different from each other?
Starting, stopping and terminating are the three states in an EC2 instance, let’s discuss them in detail:
• Stopping and Starting an instance: When an instance is stopped, the instance performs a normal shutdown and then transitions to a stopped state. All of its Amazon EBS volumes remain attached, and you can start the instance again at a later time. You are not charged for additional instance hours while the instance is in a stopped state.
• Terminating an instance: When an instance is terminated, the instance performs a normal shutdown, then the attached Amazon EBS volumes are deleted unless the volume’sdeleteOnTermination attribute is set to false. The instance itself is also deleted, and you can’t start the instance again at a later time.
6. If I want my instance to run on a single-tenant hardware, which value do I have to set the instance’s tenancy attribute to?
A. Dedicated
B. Isolated
C. One
D. Reserved
Answer A.
Explanation: The Instance tenancy attribute should be set to Dedicated Instance. The rest of the values are invalid.
7. When will you incur costs with an Elastic IP address (EIP)?
A. When an EIP is allocated.
B. When it is allocated and associated with a running instance.
C. When it is allocated and associated with a stopped instance.
D. Costs are incurred regardless of whether the EIP is associated with a running instance.
Answer C.
Explanation: You are not charged, if only one Elastic IP address is attached with your running instance. But you do get charged in the following conditions:
• When you use more than one Elastic IPs with your instance.
• When your Elastic IP is attached to a stopped instance.
• When your Elastic IP is not attached to any instance.
8. How is a Spot instance different from an On-Demand instance or Reserved Instance?
First of all, let’s understand that Spot Instance, On-Demand instance and Reserved Instances are all models for pricing. Moving along, spot instances provide the ability for customers to purchase compute capacity with no upfront commitment, at hourly rates usually lower than the On-Demand rate in each region. Spot instances are just like bidding, the bidding price is called Spot Price. The Spot Price fluctuates based on supply and demand for instances, but customers will never pay more than the maximum price they have specified. If the Spot Price moves higher than a customer’s maximum price, the customer’s EC2 instance will be shut down automatically. But the reverse is not true, if the Spot prices come down again, your EC2 instance will not be launched automatically, one has to do that manually. In Spot and On demand instance, there is no commitment for the duration from the user side, however in reserved instances one has to stick to the time period that he has chosen.
9. Are the Reserved Instances available for Multi-AZ Deployments?
A. Multi-AZ Deployments are only available for Cluster Compute instances types
B. Available for all instance types
C. Only available for M3 instance types
D. D. Not Available for Reserved Instances
Answer B.
Explanation: Reserved Instances is a pricing model, which is available for all instance types in EC2.
10. How to use the processor state control feature available on the c4.8xlarge instance?
➢ The processor state control consists of 2 states:
➢ The C state – Sleep state varying from c0 to c6. C6 being the deepest sleep state for a processor
➢ The P state – Performance state p0 being the highest and p15 being the lowest possible frequency.
➢ Now, why the C state and P state. Processors have cores, these cores need thermal headroom to boost their performance. Now since all the cores are on the processor the temperature should be kept at an optimal state so that all the cores can perform at the highest performance.
➢ Now how will these states help in that? If a core is put into sleep state it will reduce the overall temperature of the processor and hence other cores can perform better. Now the same can be synchronized with other cores, so that the processor can boost as many cores it can by timely putting other cores to sleep, and thus get an overall performance boost.
➢ Concluding, the C and P state can be customized in some EC2 instances like the c4.8xlarge instance and thus you can customize the processor according to your workload.
11. What kind of network performance parameters can you expect when you launch instances in cluster placement group?
The network performance depends on the instance type and network performance specification, if launched in a placement group you can expect up to
• 10 Gbps in a single-flow,
• 20 Gbps in multiflow i.e full duplex
• Network traffic outside the placement group will be limited to 5 Gbps(full duplex).
12. To deploy a 4 node cluster of Hadoop in AWS which instance type can be used?
First let’s understand what actually happens in a Hadoop cluster, the Hadoop cluster follows a master slave concept. The master machine processes all the data, slave machines store the data and act as data nodes. Since all the storage happens at the slave, a higher capacity hard disk would be recommended and since master does all the processing, a higher RAM and a much better CPU is required. Therefore, you can select the configuration of your machine depending on your workload. For e.g. – In this case c4.8xlarge will be preferred for master machine whereas for slave machine we can select i2.large instance. If you don’t want to deal with configuring your instance and installing hadoop cluster manually, you can straight away launch
an Amazon EMR (Elastic Map Reduce) instance which automatically configures the servers for you. You dump your data to be processed in S3, EMR picks it from there, processes it, and dumps it back into S3.
13. Where do you think an AMI fits, when you are designing an architecture for a solution?
AMIs(Amazon Machine Images) are like templates of virtual machines and an instance is derived from an AMI. AWS offers pre-baked AMIs which you can choose while you are launching an instance, some AMIs are not free, therefore can be bought from the AWS Marketplace. You can also choose to create your own custom AMI which would help you save space on AWS. For example if you don’t need a set of software on your installation, you can customize your AMI to do that. This makes it cost efficient, since you are removing the unwanted things.
14. How do you choose an Availability Zone?
Let’s understand this through an example, consider there’s a company which has user base in India as well as in the US.
Let us see how we will choose the region for this use case :
➢ So, with reference to the above figure the regions to choose between are, Mumbai and North Virginia. Now let us first compare the pricing, you have hourly prices, which can be converted to your per month figure. Here North Virginia emerges as a winner. But, pricing cannot be the only parameter to consider. Performance should also be kept in mind hence, let’s look at latency as well. Latency basically is the time that a server takes to respond to your requests i.e the response time. North Virginia wins again!
➢ So concluding, North Virginia should be chosen for this use case.
15. Is one Elastic IP address enough for every instance that I have running?
Depends! Every instance comes with its own private and public address. The private address is associated exclusively with the instance and is returned to Amazon EC2 only when it is stopped or terminated. Similarly, the public address is associated exclusively with the instance until it is stopped or terminated. However, this can be replaced by the Elastic IP address, which stays with the instance as long as the user doesn’t manually detach it. But what if you are hosting multiple websites on your EC2 server, in that case you may require more than one Elastic IP address.
16. What are the best practices for Security in Amazon EC2?
There are several best practices to secure Amazon EC2. A few of them are given below:
• Use AWS Identity and Access Management (IAM) to control access to your AWS resources.
• Restrict access by only allowing trusted hosts or networks to access ports on your instance.
• Review the rules in your security groups regularly, and ensure that you apply the principle of least
• Privilege – only open up permissions that you require.
• Disable password-based logins for instances launched from your AMI. Passwords can be found or cracked, and are a security risk.
Section 3: Amazon Storage
17. Another scenario based interview question. You need to configure an Amazon S3 bucket to serve static assets for your public-facing web application. Which method will ensure that all objects uploaded to the bucket are set to public read?
A. Set permissions on the object to public read during upload.
B. Configure the bucket policy to set all objects to public read.
C. Use AWS Identity and Access Management roles to set the bucket to public read.
D. Amazon S3 objects default to public read, so no action is needed.
Answer B.
Explanation: Rather than making changes to every object, its better to set the policy for the whole bucket. IAM is used to give more granular permissions, since this is a website, all objects would be public by default.
18. A customer wants to leverage Amazon Simple Storage Service (S3) and Amazon Glacier as part of their backup and archive infrastructure. The customer plans to use third-party software to support this integration. Which approach will limit the access of the third party software to only the Amazon S3 bucket named “company-backup”?
A. A custom bucket policy limited to the Amazon S3 API in three Amazon Glacier archive “company-backup”
B. A custom bucket policy limited to the Amazon S3 API in “company-backup”
C. A custom IAM user policy limited to the Amazon S3 API for the Amazon Glacier archive “company-backup”.
D. A custom IAM user policy limited to the Amazon S3 API in “company-backup”.
Answer D.
Explanation: Taking queue from the previous questions, this use case involves more granular permissions, hence IAM would be used here.
19. Can S3 be used with EC2 instances, if yes, how?
Yes, it can be used for instances with root devices backed by local instance storage. By using Amazon S3, developers have access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. In order to execute systems in the Amazon EC2 environment, developers use the tools provided to load their Amazon Machine Images (AMIs) into Amazon S3 and to move them between Amazon S3 and Amazon EC2.
Another use case could be for websites hosted on EC2 to load their static content from S3.
20. A customer implemented AWS Storage Gateway with a gateway-cached volume at their main office. An event takes the link between the main and branch office offline. Which methods will enable the branch office to access their data?
A. Restore by implementing a lifecycle policy on the Amazon S3 bucket.
B. Make an Amazon Glacier Restore API call to load the files into another Amazon S3 bucket within four to six hours.
C. Launch a new AWS Storage Gateway instance AMI in Amazon EC2, and restore from a gateway snapshot.
D. Create an Amazon EBS volume from a gateway snapshot, and mount it to an Amazon EC2 instance.
Answer C.
Explanation: The fastest way to do it would be launching a new storage gateway instance. Why? Since time is the key factor which drives every business, troubleshooting this problem will take more time. Rather than we can just restore the previous working state of the storage gateway on a new instance.
21. When you need to move data over long distances using the internet, for instance across countries or continents to your Amazon S3 bucket, which method or service will you use?
A. Amazon Glacier
B. Amazon CloudFront
C. Amazon Transfer Acceleration
D. Amazon Snowball
Answer C.
Explanation: You would not use Snowball, because for now, the snowball service does not support cross region data transfer, and since, we are transferring across countries, Snowball cannot be used. Transfer Acceleration shall be the right choice here as it throttles your data transfer with the use of optimized network paths and Amazon’s content delivery network upto 300% compared to normal data transfer speed.
22. How can you speed up data transfer in Snowball?
The data transfer can be increased in the following way:
• By performing multiple copy operations at one time i.e. if the workstation is powerful enough, you can initiate multiple cp commands each from different terminals, on the same Snowball device.
• Copying from multiple workstations to the same snowball.
• Transferring large files or by creating a batch of small file, this will reduce the encryption overhead.
• Eliminating unnecessary hops i.e. make a setup where the source machine(s) and the snowball are the only machines active on the switch being used, this can hugely improve performance.
Section 4: AWS VPC
23. If you want to launch Amazon Elastic Compute Cloud (EC2) instances and assign each instance a predetermined private IP address you should:
A. Launch the instance from a private Amazon Machine Image (AMI).
B. Assign a group of sequential Elastic IP address to the instances.
C. Launch the instances in the Amazon Virtual Private Cloud (VPC).
D. Launch the instances in a Placement Group.
Answer C.
Explanation: The best way of connecting to your cloud resources (for ex- ec2 instances) from your own data center (for eg- private cloud) is a VPC. Once you connect your datacenter to the VPC in which your instances are present, each instance is assigned a private IP address which can be accessed from your datacenter. Hence, you can access your public cloud resources, as if they were on your own network.
24. Can I connect my corporate datacenter to the Amazon Cloud?
Yes, you can do this by establishing a VPN(Virtual Private Network) connection between your company’s network and your VPC (Virtual Private Cloud), this will allow you to interact with your EC2 instances as if they were within your existing network.
25. Is it possible to change the private IP addresses of an EC2 while it is running/stopped in a VPC?
Primary private IP address is attached with the instance throughout its lifetime and cannot be changed, however secondary private addresses can be unassigned, assigned or moved between interfaces or instances at any point.
26. Why do you make subnets?
A. Because there is a shortage of networks
B. To efficiently utilize networks that have a large no. of hosts.
C. Because there is a shortage of hosts.
D. To efficiently utilize networks that have a small no. of hosts.
Answer B.
Explanation: If there is a network which has a large no. of hosts, managing all these hosts can be a tedious job. Therefore we divide this network into subnets (sub-networks) so that managing these hosts becomes simpler.
27. Which of the following is true?
A. You can attach multiple route tables to a subnet
B. You can attach multiple subnets to a route table
C. Both A and B
D. None of these.
Answer B.
Explanation: Route Tables are used to route network packets, therefore in a subnet having multiple route tables will lead to confusion as to where the packet has to go. Therefore, there is only one route table in a subnet, and since a route table can have any no. of records or information, hence attaching multiple subnets to a route table is possible.
28. In CloudFront what happens when content is NOT present at an Edge location and a request is made to it?
A. An Error “404 not found” is returned
B. CloudFront delivers the content directly from the origin server and stores it in the cache of the edge location
C. The request is kept on hold till content is delivered to the edge location
D. The request is routed to the next closest edge location
Answer B.
Explanation: CloudFront is a content delivery system, which caches data to the nearest edge location from the user, to reduce latency. If data is not present at an edge location, the first time the data may get transferred from the original server, but from the next time, it will be served from the cached edge.
29. If I’m using Amazon CloudFront, can I use Direct Connect to transfer objects from my own data center?
Yes. Amazon CloudFront supports custom origins including origins from outside of AWS. With AWS Direct Connect, you will be charged with the respective data transfer rates.
30. If my AWS Direct Connect fails, will I lose my connectivity?
If a backup AWS Direct connect has been configured, in the event of a failure it will switch over to the second one. It is recommended to enable Bidirectional Forwarding Detection (BFD) when configuring your connections to ensure faster detection and failover. On the other hand, if you have configured a backup IPsec VPN connection instead, all VPC traffic will failover to the backup VPN connection automatically. Traffic to/from public resources such as Amazon S3 will
be routed over the Internet. If you do not have a backup AWS Direct Connect link or a IPsec VPN link, then Amazon VPC traffic will be dropped in the event of a failure.
Section 5: Amazon Database
31. If I launch a standby RDS instance, will it be in the same Availability Zone as my primary?
A. Only for Oracle RDS types
B. Yes
C. Only if it is configured at launch
D. No
Answer D.
Explanation: No, since the purpose of having a standby instance is to avoid an infrastructure failure (if it happens), therefore the standby instance is stored in a different availability zone, which is a physically different independent infrastructure.
32. When would I prefer Provisioned IOPS over Standard RDS storage?
A. If you have batch-oriented workloads
B. If you use production online transaction processing (OLTP) workloads.
C. If you have workloads that are not sensitive to consistent performance
D. All of the above
Answer A.
Explanation: Provisioned IOPS deliver high IO rates but on the other hand it is expensive as well. Batch processing workloads do not require manual intervention they enable full utilization of systems, therefore a provisioned IOPS will be preferred for batch oriented workload.
33. How is Amazon RDS, DynamoDB and Redshift different?
• Amazon RDS is a database management service for relational databases, it manages patching, upgrading, backing up of data etc. of databases for you without your intervention. RDS is a Db management service for structured data only.
• DynamoDB, on the other hand, is a NoSQL database service, NoSQL deals with unstructured data.
• Redshift, is an entirely different service, it is a data warehouse product and is used in data analysis.
34. If I am running my DB Instance as a Multi-AZ deployment, can I use the standby DB Instance for read or write operations along with primary DB instance?
A. Yes
B. Only with MySQL based RDS
C. Only for Oracle RDS instances
D. No
Answer D.
Explanation: No, Standby DB instance cannot be used with primary DB instance in parallel, as the former is solely used for standby purposes, it cannot be used unless the primary instance goes down.
35. Your company’s branch offices are all over the world, they use a software with a multi-regional deployment on AWS, they use MySQL 5.6 for data persistence.
The task is to run an hourly batch process and read data from every region to compute cross-regional reports which will be distributed to all the branches. This should be done in the shortest time possible. How will you build the DB architecture in order to meet the requirements?
A. For each regional deployment, use RDS MySQL with a master in the region and a read replica in the HQ region
B. For each regional deployment, use MySQL on EC2 with a master in the region and send hourly EBS snapshots to the HQ region
C. For each regional deployment, use RDS MySQL with a master in the region and send hourly RDS snapshots to the HQ region
D. For each regional deployment, use MySQL on EC2 with a master in the region and use S3 to copy data files hourly to the HQ region
Answer A.
Explanation: For this we will take an RDS instance as a master, because it will manage our database for us and since we have to read from every region, we’ll put a read replica of this instance in every region where the data has to be read from. Option C is not correct since putting a read replica would be more efficient than putting a snapshot, a read replica can be promoted if needed to an independent DB instance, but with a Db snapshot it becomes mandatory to launch a separate DB Instance.
36. Can I run more than one DB instance for Amazon RDS for free?
Yes. You can run more than one Single-AZ Micro database instance, that too for free! However, any use exceeding 750 instance hours, across all Amazon RDS Single-AZ Micro DB instances, across all eligible database engines and regions, will be billed at standard Amazon RDS prices. For example: if you run two Single-AZ Micro DB instances for 400 hours each in a single month, you will accumulate 800 instance hours of usage, of which 750 hours will be free. You will be billed for the remaining 50 hours at the standard Amazon RDS price.
37. Which AWS services will you use to collect and process e-commerce data for near real-time analysis?
A. Amazon ElastiCache
B. Amazon DynamoDB
C. Amazon Redshift
D. Amazon Elastic MapReduce
Answer B,C.
Explanation: DynamoDB is a fully managed NoSQL database service. DynamoDB, therefore can be fed any type of unstructured data, which can be data from e-commerce websites as well, and later, an analysis can be done on them using Amazon Redshift. We are not using Elastic MapReduce, since a near real time analyses is needed.
38. Can I retrieve only a specific element of the data, if I have a nested JSON data in DynamoDB?
Yes. When using the GetItem, BatchGetItem, Query or Scan APIs, you can define a Projection Expression to determine which attributes should be retrieved from the table. Those attributes can include scalars, sets, or elements of a JSON document.
39. A company is deploying a new two-tier web application in AWS. The company has limited staff and requires high availability, and the application requires complex queries and table joins. Which configuration provides the solution for the company’s requirements?
A. MySQL Installed on two Amazon EC2 Instances in a single Availability Zone
B. Amazon RDS for MySQL with Multi-AZ
C. Amazon ElastiCache
D. Amazon DynamoDB
Answer D.
Explanation: DynamoDB has the ability to scale more than RDS or any other relational database service, therefore DynamoDB would be the apt choice.
40. What happens to my backups and DB Snapshots if I delete my DB Instance?
When you delete a DB instance, you have an option of creating a final DB snapshot, if you do that you can restore your database from that snapshot. RDS retains this user-created DB snapshot along with all other manually created DB snapshots after the instance is deleted, also automated backups are deleted and only manually created DB Snapshots are retained.
41. Which of the following use cases are suitable for Amazon DynamoDB? Choose 2 answers
A. Managing web sessions.
B. Storing JSON documents.
C. Storing metadata for Amazon S3 objects.
D. Running relational joins and complex updates.
Answer C,D.
Explanation: If all your JSON data have the same fields eg [id,name,age] then it would be better to store it in a relational database, the metadata on the other hand is unstructured, also running relational joins or complex updates would work on DynamoDB as well.
42. How can I load my data to Amazon Redshift from different data sources like Amazon RDS, Amazon DynamoDB and Amazon EC2?
You can load the data in the following two ways:
• You can use the COPY command to load data in parallel directly to Amazon Redshift from Amazon EMR, Amazon DynamoDB, or any SSH-enabled host.
• AWS Data Pipeline provides a high performance, reliable, fault tolerant solution to load data from a variety of AWS data sources. You can use AWS Data Pipeline to specify the data source, desired data transformations, and then execute a pre-written import script to load your data into Amazon Redshift.
43. Your application has to retrieve data from your user’s mobile every 5 minutes and the data is stored in DynamoDB, later every day at a particular time the data is extracted into S3 on a per user basis and then your application is later used to visualize the data to the user. You are asked to optimize the architecture of the backend system to lower cost, what would you recommend?
A. Create a new Amazon DynamoDB (able each day and drop the one for the previous day after its data is on Amazon S3.
B. Introduce an Amazon SQS queue to buffer writes to the Amazon DynamoDB table and reduce provisioned write throughput.
C. Introduce Amazon Elasticache to cache reads from the Amazon DynamoDB table and reduce provisioned read throughput.
D. Write data directly into an Amazon Redshift cluster replacing both Amazon DynamoDB and Amazon S3.
Answer C.
Explanation: Since our work requires the data to be extracted and analyzed, to optimize this process a person would use provisioned IO, but since it is expensive, using a ElastiCache memoryinsread to cache the results in the memory can reduce the provisioned read throughput and hence reduce cost without affecting the performance.
44. You are running a website on EC2 instances deployed across multiple Availability Zones with a Multi-AZ RDS MySQL Extra Large DB Instance. The site performs a high number of small reads and writes per second and relies on an eventual consistency model. After comprehensive tests you discover that there is read contention on RDS MySQL. Which are the best approaches to meet these requirements? (Choose 2 answers)
A. Deploy ElastiCache in-memory cache running in each availability zone
B. Implement sharding to distribute load to multiple RDS MySQL instances
C. Increase the RDS MySQL Instance size and Implement provisioned IOPS
D. Add an RDS MySQL read replica in each availability zone
Answer A,C.
Explanation: Since it does a lot of read writes, provisioned IO may become expensive. But we need high performance as well, therefore the data can be cached using ElastiCache which can be used for frequently reading the data. As for RDS since read contention is happening, the instance size should be increased and provisioned IO should be introduced to increase the performance.
45. A startup is running a pilot deployment of around 100 sensors to measure street noise and air quality in urban areas for 3 months. It was noted that every month around 4GB of sensor data is generated. The company uses a load balanced auto scaled layer of EC2 instances and a RDS database with 500 GB standard storage. The pilot was a success and now they want to deploy at least 100K sensors which need to be supported by the backend. You need to store the data for at least 2 years to analyze it. Which setup of the following would you prefer?
A. Add an SQS queue to the ingestion layer to buffer writes to the RDS instance
B. Ingest data into a DynamoDB table and move old data to a Redshift cluster
C. Replace the RDS instance with a 6 node Redshift cluster with 96TB of storage
D. Keep the current architecture but upgrade RDS storage to 3TB and 10K provisioned IOPS
Answer C. Explanation: A Redshift cluster would be preferred because it easy to scale, also the work would be done in parallel through the nodes, therefore is perfect for a bigger workload like our use case. Since each month 4 GB of data is generated, therefore in 2 year, it should be around 96 GB. And since the servers will be increased to 100K in number, 96 GB will approximately become 96TB. Hence option C is the right answer.
Section 6: AWS Auto Scaling, AWS Load Balancer
46. Suppose you have an application where you have to render images and also do some general computing. From the following services which service will best fit your need?
A. Classic Load Balancer
B. Application Load Balancer
C. Both of them
D. None of these
Answer B.
Explanation: You will choose an application load balancer, since it supports path based routing, which means it can take decisions based on the URL, therefore if your task needs image rendering it will route it to a different instance, and for general computing it will route it to a different instance.
47. What is the difference between Scalability and Elasticity?
Scalability is the ability of a system to increase its hardware resources to handle the increase in demand. It can be done by increasing the hardware specifications or increasing the processing nodes.
Elasticity is the ability of a system to handle increase in the workload by adding additional hardware resources when the demand increases(same as scaling) but also rolling back the scaled resources, when the resources are no longer needed. This is particularly helpful in Cloud environments, where a pay per use model is followed.
48. How will you change the instance type for instances which are running in your application tier and are using Auto Scaling. Where will you change it from the following areas?
A. Auto Scaling policy configuration
B. Auto Scaling group
C. Auto Scaling tags configuration
D. Auto Scaling launch configuration
Answer D.
Explanation: Auto scaling tags configuration, is used to attach metadata to your instances, to change the instance type you have to use auto scaling launch configuration.
49. You have a content management system running on an Amazon EC2 instance that is approaching 100% CPU utilization. Which option will reduce load on the Amazon EC2 instance?
A. Create a load balancer, and register the Amazon EC2 instance with it
B. Create a CloudFront distribution, and configure the Amazon EC2 instance as the origin
C. Create an Auto Scaling group from the instance using the CreateAutoScalingGroup action
D. Create a launch configuration from the instance using the CreateLaunchConfigurationAction
Answer A.
Explanation:Creating alone an autoscaling group will not solve the issue, until you attach a load balancer to it. Once you attach a load balancer to an autoscaling group, it will efficiently distribute the load among all the instances. Option B – CloudFront is a CDN, it is a data transfer tool therefore will not help reduce load on the EC2 instance. Similarly the other option – Launch configuration is a template for configuration which has no connection with reducing loads.
50. When should I use a Classic Load Balancer and when should I use an Application load balancer?
A Classic Load Balancer is ideal for simple load balancing of traffic across multiple EC2 instances, while an Application Load Balancer is ideal for microservices or container-based architectures where there is a need to route traffic to multiple services or load balance across multiple ports on the same EC2 instance.
51. What does Connection draining do?
A. Terminates instances which are not in use.
B. Re-routes traffic from instances which are to be updated or failed a health check.
C. Re-routes traffic from instances which have more workload to instances which have less workload.
D. Drains all the connections from an instance, with one click.
Answer B.
Explanation: Connection draining is a service under ELB which constantly monitors the health of the instances. If any instance fails a health check or if any instance has to be patched with a software update, it pulls all the traffic from that instance and re routes them to other instances.
52. When an instance is unhealthy, it is terminated and replaced with a new one, which of the following services does that?
A. Sticky Sessions
B. Fault Tolerance
C. Connection Draining
D. Monitoring
Answer B.
Explanation: When ELB detects that an instance is unhealthy, it starts routing incoming traffic to other healthy instances in the region. If all the instances in a region becomes unhealthy, and if you have instances in some other availability zone/region, your traffic is directed to them. Once your instances become healthy again, they are re routed back to the original instances.
53. What are lifecycle hooks used for in AutoScaling?
A. They are used to do health checks on instances
B. They are used to put an additional wait time to a scale in or scale out event.
C. They are used to shorten the wait time to a scale in or scale out event
D. None of these
Answer B.
Explanation: Lifecycle hooks are used for putting wait time before any lifecycle action i.e launching or terminating an instance happens. The purpose of this wait time, can be anything from extracting log files before terminating an instance or installing the necessary softwares in an instance before launching it.
54. A user has setup an Auto Scaling group. Due to some issue the group has failed to launch a single instance for more than 24 hours. What will happen to Auto Scaling in this condition?
A. Auto Scaling will keep trying to launch the instance for 72 hours
B. Auto Scaling will suspend the scaling process
C. Auto Scaling will start an instance in a separate region
D. The Auto Scaling group will be terminated automatically
Answer B.
Explanation: Auto Scaling allows you to suspend and then resume one or more of the Auto Scaling processes in your Auto Scaling group. This can be very useful when you want to investigate a configuration problem or other issue with your web application, and then make changes to your application, without triggering the Auto Scaling process.
Section 7: CloudTrail, Route 53
55. You have an EC2 Security Group with several running EC2 instances. You changed the Security Group rules to allow inbound traffic on a new port and protocol, and then launched several new instances in the same Security Group. The new rules apply:
A. Immediately to all instances in the security group.
B. Immediately to the new instances only.
C. Immediately to the new instances, but old instances must be stopped and restarted before the new rules apply.
D. To all instances, but it may take several minutes for old instances to see the changes.
Answer A.
Explanation: Any rule specified in an EC2 Security Group applies immediately to all the instances, irrespective of when they are launched before or after adding a rule.
56. To create a mirror image of your environment in another region for disaster recovery, which of the following AWS resources do not need to be recreated in the second region? ( Choose 2 answers )
A. Route 53 Record Sets
B. Elastic IP Addresses (EIP)
C. EC2 Key Pairs
D. Launch configurations
E. Security Groups
Answer A.
Explanation: Route 53 record sets are common assets therefore there is no need to replicate them, since Route 53 is valid across regions
57. A customer wants to capture all client connection information from his load balancer at an interval of 5 minutes, which of the following options should he choose for his application?
A. Enable AWS CloudTrail for the loadbalancer.
B. Enable access logs on the load balancer.
C. Install the Amazon CloudWatch Logs agent on the load balancer.
D. Enable Amazon CloudWatch metrics on the load balancer.
Answer A.
Explanation: AWS CloudTrail provides inexpensive logging information for load balancer and other AWS resources This logging information can be used for analyses and other administrative work, therefore is perfect for this use case.
58. A customer wants to track access to their Amazon Simple Storage Service (S3) buckets and also use this information for their internal security and access audits. Which of the following will meet the Customer requirement?
A. Enable AWS CloudTrail to audit all Amazon S3 bucket access.
B. Enable server access logging for all required Amazon S3 buckets.
C. Enable the Requester Pays option to track access via AWS Billing
D. Enable Amazon S3 event notifications for Put and Post.
Answer A.
Explanation: AWS CloudTrail has been designed for logging and tracking API calls. Also this service is available for storage, therefore should be used in this use case.
59. Which of the following are true regarding AWS CloudTrail? (Choose 2 answers)
A. CloudTrail is enabled globally
B. CloudTrail is enabled on a per-region and service basis
C. Logs can be delivered to a single Amazon S3 bucket for aggregation.
D. CloudTrail is enabled for all available services within a region.
Answer B,C.
Explanation: Cloudtrail is not enabled for all the services and is also not available for all the regions. Therefore option B is correct, also the logs can be delivered to your S3 bucket, hence C is also correct.
60. What happens if CloudTrail is turned on for my account but my Amazon S3 bucket is not configured with the correct policy?
CloudTrail files are delivered according to S3 bucket policies. If the bucket is not configured or is misconfigured, CloudTrail might not be able to deliver the log files.
61. How do I transfer my existing domain name registration to Amazon Route 53 without disrupting my existing web traffic?
You will need to get a list of the DNS record data for your domain name first, it is generally available in the form of a “zone file” that you can get from your existing DNS provider. Once you receive the DNS record data, you can use Route 53’s Management Console or simple web-services interface to create a hosted zone that will store your DNS records for your domain name and follow its transfer process. It also includes steps such as updating the nameservers for your domain name to the ones associated with your hosted zone. For completing the process you have to contact the registrar with whom you registered your domain name and follow the transfer process. As soon as your registrar propagates the new name server delegations, your DNS queries will start to get answered.
Section 8: AWS SQS, AWS SNS, AWS SES, AWS ElasticBeanstalk
62. Which of the following services you would not use to deploy an app?
A. Elastic Beanstalk
B. Lambda
C. Opsworks
D. CloudFormation
Answer B.
Explanation: Lambda is used for running server-less applications. It can be used to deploy functions triggered by events. When we say serverless, we mean without you worrying about the computing resources running in the background. It is not designed for creating applications which are publicly accessed.
63. How does Elastic Beanstalk apply updates?
A. By having a duplicate ready with updates before swapping.
B. By updating on the instance while it is running
C. By taking the instance down in the maintenance window
D. Updates should be installed manually
Answer A.
Explanation: Elastic Beanstalk prepares a duplicate copy of the instance, before updating the original instance, and routes your traffic to the duplicate instance, so that, incase your updated application fails, it will switch back to the original instance, and there will be no downtime experienced by the users who are using your application.
64. How is AWS Elastic Beanstalk different than AWS OpsWorks?
➢ AWS Elastic Beanstalk is an application management platform while OpsWorks is a configuration management platform. BeanStalk is an easy to use service which is used for deploying and scaling web applications developed with Java, .Net, PHP, Node.js, Python, Ruby, Go and Docker. Customers upload their code and Elastic Beanstalk automatically handles the deployment. The application will be ready to use without any infrastructure or resource configuration.
➢ In contrast, AWS Opsworks is an integrated configuration management platform for IT administrators or DevOps engineers who want a high degree of customization and control over operations.
65. What happens if my application stops responding to requests in beanstalk?
AWS Beanstalk applications have a system in place for avoiding failures in the underlying infrastructure. If an Amazon EC2 instance fails for any reason, Beanstalk will use Auto Scaling to automatically launch a new instance. Beanstalk can also detect if your application is not responding on the custom link, even though the infrastructure appears healthy, it will be logged as an environmental event( e.g a bad version was deployed) so you can take an appropriate action.
Section 9: AWS OpsWorks, AWS KMS
66. How is AWS OpsWorks different than AWS CloudFormation?
➢ OpsWorks and CloudFormation both support application modelling, deployment, configuration, management and related activities. Both support a wide variety of architectural patterns, from simple web applications to highly complex applications. AWS OpsWorks and AWS CloudFormation differ in abstraction level and areas of focus.
➢ AWS CloudFormation is a building block service which enables customer to manage almost any AWS resource via JSON-based domain specific language. It provides foundational capabilities for the full breadth of AWS, without prescribing a particular model for development and operations. Customers define templates and use them to provision and manage AWS resources, operating systems and application code.
In contrast, AWS OpsWorks is a higher level service that focuses on providing highly productive and reliable DevOps experiences for IT administrators and ops-minded developers. To do this, AWS OpsWorks employs a configuration management model based on concepts such as stacks and layers, and provides integrated experiences for key activities like deployment, monitoring, auto-scaling, and automation. Compared to AWS CloudFormation, AWS OpsWorks supports a narrower range of application-oriented AWS resource types including Amazon EC2 instances, Amazon EBS volumes, Elastic IPs, and Amazon CloudWatch metrics.
67. I created a key in Oregon region to encrypt my data in North Virginia region for security purposes. I added two users to the key and an external AWS account. I wanted to encrypt an object in S3, so when I tried, the key that I just created was not listed. What could be the reason?
A. External aws accounts are not supported.
B. AWS S3 cannot be integrated KMS.
C. The Key should be in the same region.
D. New keys take some time to reflect in the list.
Answer C.
Explanation: The key created and the data to be encrypted should be in the same region. Hence the approach taken here to secure the data is incorrect.
68. A company needs to monitor the read and write IOPS for their AWS MySQL RDS instance and send real-time alerts to their operations team. Which AWS services can accomplish this?
A. Amazon Simple Email Service
B. Amazon CloudWatch
C. Amazon Simple Queue Service
D. Amazon Route 53
Answer B.
Explanation: Amazon CloudWatch is a cloud monitoring tool and hence this is the right service for the mentioned use case. The other options listed here are used for other purposes for example route 53 is used for DNS services, therefore CloudWatch will be the apt choice.
69. What happens when one of the resources in a stack cannot be created successfully in AWS OpsWorks?
When an event like this occurs, the “automatic rollback on error” feature is enabled, which causes all the AWS resources which were created successfully till the point where the error occurred to be deleted. This is helpful since it does not leave behind any erroneous data, it ensures the fact that stacks are either created fully or not created at all. It is useful in events where you may accidentally exceed your limit of the no. of Elastic IP addresses or maybe you may not have access to an EC2 AMI that you are trying to run etc.
70. What automation tools can you use to spinup servers?
Any of the following tools can be used:
• Roll-your-own scripts, and use the AWS API tools. Such scripts could be written in bash, perl or other language of your choice.
• Use a configuration management and provisioning tool like puppet or its successor Opscode Chef. You can also use a tool like Scalr.
• Use a managed solution such as Rightscale.

Monday, 29 April 2019

Top 50 Devops interview questions


1) Explain what is DevOps?
It is a newly emerging term in IT field, which is nothing but a practice that emphasizes the collaboration and communication of both software developers and other information-technology (IT) professionals. It focuses on delivering software product faster and lowering the failure rate of releases.
2) Mention what are the key aspects or principle behind DevOps?
The key aspects or principle behind DevOps is
Infrastructure as code
Continuous deployment
Automation
Monitoring
Security
3) What are the core operations of DevOps with application development and with infrastructure?
The core operations of DevOps with
Application development
Code building
Code coverage
Unit testing
Packaging
Deployment
4) What is GIT?
GIT is a distributed version control system and source code
management (SCM) system with an emphasis to handle small and large
projects with speed and efficiency.
5) What is a repository in GIT?
A repository contains a directory named .git, where git keeps all of its
metadata for the repository. The content of the .git directory are
private to git.
6) What is the difference between GIT and SVN?
The difference between GIT and SVN is
a) Git is less preferred for handling extremely large files or frequently
changing binary files while SVN can handle multiple projects stored in
the same repository.
b) GIT does not support ‘commits’ across multiple branches or tags.
Subversion allows the creation of folders at any location in the
repository layout.
c) Gits are unchangeable, while Subversion allows committers to treat a
tag as a branch and to create multiple revisions under a tag root.
7) What are the advantages of using GIT?
a) Data redundancy and replication
b) High availability
c) Only one.git directory per repository
d) Superior disk utilization and network performance
e) Collaboration friendly
f) Any sort of projects can use GIT
8) Why GIT better than Subversion?
GIT is an open source version control system; it will allow you to run ‘versions’ of a project, which show the changes that were made to the code overtime also it allows you keep the backtrack if necessary and undo those changes. Multiple developers can checkout, and upload changes and each change can then be attributed to a specific developer.
9) What is “Staging Area” or “Index” in GIT?
Before completing the commits, it can be formatted and reviewed in an intermediate area known as ‘Staging Area’ or ‘Index’.
10) What is GIT stash?
GIT stash takes the current state of the working directory and index and puts in on the stack for later and gives you back a clean working directory. So in case if you are in the middle of something and need to jump over to the other job, and at the same time you don’t want to lose your current edits then you can use GIT stash.
11) What is the function of git clone?
The git clone command creates a copy of an existing Git repository. To get the copy of a central repository, ‘cloning’ is the most common way used by programmers.
12) What does commit object contain?
a) A set of files, representing the state of a project at a given point of time
b) Reference to parent commit objects
c) An SHAI name, a 40 character string that uniquely identifies the commit object.
13) How can you create a repository in Git?
In Git, to create a repository, create a directory for the project if it does not exist, and then run command “git init”. By running this command .git directory will be created in the project directory, the directory does not need to be empty.
14) What is the purpose of branching in GIT?
The purpose of branching in GIT is that you can create your own branch and jump between those branches. It will allow you to go to your previous work keeping your recent work intact.
15) What is the common branching pattern in GIT?
The common way of creating branch in GIT is to maintain one as “Main“
branch and create another branch to implement new features. This pattern is particularly useful when there are multiple developers working on a single project.
16) What is a ‘conflict’ in git?
A ‘conflict’ arises when the commit that has to be merged has some change in one place, and the current commit also has a change at the same place. Git will not be able to predict which change should take precedence.
17) What is GIT version control?
With the help of GIT version control, you can track the history of a collection of files and includes the functionality to revert the collection of files to another version. Each version captures a snapshot of the file system at a certain point of time. A collection of files and their complete history are stored in a repository.
18) What is Chef?
Chef is a powerful configuration management tool and automation tool. It converts the infrastructure of the company into a structured format of code. Thanks to Chef you can develop scripts that can be used for automating the IT and business process.
19) Explain the major components of Chef?
The architecture of Chef can be broken down into the Chef Server, Chef Workstation and Chef Node.
Chef Server – You can think of the Chef Server as the central store for that accumulates all the data that is necessary for configuring the nodes.
Chef Node – The Chef Node can be thought of as a client responsible for sharing data across network and it is based on the chef-client architecture.
Chef Workstation – The Chef Workstation can be thought of as the host for modifying the configuration data and cookbooks which is then forwarded to the Chef Server.
20) What is a Chef resource and what are its functions?
You can think of the Chef resource as a part of the infrastructure which is used for installing or running a service. Here we will discuss some of the important Chef resources:
The desired state of a configuration item can be described
To bring a certain item to a state you will know what are the steps involved
You can choose the resources like package, template or services
You can list the resource properties and other details needed
You can group the resources into recipes for describing working configuration.
21) What are Chef Resources and its functions?
When you group resources, it gets converted into a Recipe and this describes the working configuration and the policy. Using the Recipe, you can get all that is necessary for configuring a specific system. So here we list some of the important functions of a Recipe:
With the Chef Recipe you can install the software components
You can manage the files and apps deployment
You can execute the other recipes using one recipe.
22) What is a Chef Node and what is its importance?
We can think of the node as a physical server or a virtual machine that is a constituent of the Chef architecture. You can execute any resource with Chef.
23) What is the difference between a Cookbook and a Recipe in Chef?
When you group resources together what you get is a Recipe and this is useful in executing the configurations and policy. When you combine Recipes what you get is a Cookbook and this is easily manageable as compared to a Recipe.
24) What is the difference between a Cookbook and a Recipe in Chef?
When the action for a Chef Resource is not defined then it will automatically choose the Default action.
25) How does a Chef Repository work?
You can think of the Chef Repository as a collection of Cookbooks, roles, environments, data bags and more. It is possible to sync the Chef Repository with the Git and a Version Control System in order to improve the performance of it.
26) Explain the run-list in Chef?
The run-list is needed for specifying the Recipes that are needed for running it and the order in which the Recipes will be executed.
Some of the advantages of run-list include:
It ensures that the Recipes are running in the same order as specified
The node on which the run-list is to be executed has to be specified
It is transferred from the Workstation to Chef Server and the management console.
27) What is the importance of Chef starter kit?
The starter kit is needed for creating the configuration files in Chef. It gives you the information for interacting with the server and defining the configuration file. You can easily download the starter kit and use it at the desired place on the workstation.
28) What is the process for updating a Chef Cookbook?
we give you the steps to follow for updating a Chef Cookbook:
From the Workstation run the Knife SSH
Run the SSH and Chef-client on the server directly
You can use the Chef-client as a daemon for restarting the service as needed.
29) What is the process for bootstrapping in Chef and the information needed?
If you want to bootstrap in Chef, then you need the following information:
The hostname or Public IP address of the node
The user name and password for logging into a particular node
Using keys as authentication rather than any login credentials.
30) What is Docker?
You can define Docker as a containerization platform that combines all your applications in a package so that you have all the dependencies to run your applications in any environment. This means your application will run seamlessly on any environment and this makes it easy for having a product ready application. What Docker does is wrap the software that is needed in a file system that has everything for running the code, providing the runtime and all the necessary libraries and system tools. Containerization technology like Docker will share the same operating system kernel with the machine and due to this it is
extremely fast. This means that you have to run the Docker only at the beginning and after that since your OS is already running, you will have a smooth and seamless process.
31) What is the benefit of using a Docker over a hypervisor?
Though Docker and Hypervisor might do the same job overall there are many differences between them in terms of how they work. Docker can be thought of as light weight since it uses very less resources and also the host kernel rather than creating it like a Hypervisor.
32) What are the unique features of Docker over other containerization technology?
we list some of the most important and unique features of Docker that makes it a top containerization technology unlike any other in the market today
You can run your Docker container either on your PC or your enterprise IT system
Along with the Docker Hub which is a repository of all containers you can deploy and download all your applications from a central location
You can even share your applications with the containers that you create.
33) What is Docker image?
Here we will be explaining what is the Docker image. The Docker image help to create the Docker containers. You can create the Docker image with the build command, due to this it creates a container that starts when it begins to run. All the docker images are stored in the Docker registry like the public docker registry. These have minimal amounts of layers within the image so that there is minimum amount of data on the network.
34) What is Docker container?
Here we will be discussing what is a Docker container. It is a comprehensive set of applications including all its dependencies which share the same OS kernel along with the other containers running in separate processes within the operating system in a user space. The Docker is not tied to any IT infrastructure and thus it can run on any computer system or the cloud. You can create a Docker container using the Docker images and then running it or you can use the images that are already created in the Docker Hub. To simplify things, let us say that the Docker containers are just runtime instances of the Docker image.
35) What is Docker hub?
You can think of Docker Hub as a cloud registry that lets you link the code repositories, create the images and test them. You can also store your pushed images, or you can link to the Docker Cloud, so that the images can be deployed to the host. You have a centralized container image discovery resource which can be used for collaboration of your teams, automating the workflow, distribution and change management by creating the development pipeline.
36) What is the use of Dockerfile?
The Dockerfile can be thought of as a set of instructions that you need to pass on to the Docker so that the images can be built from the specified instructions in the Dockerfile. You can think of the Dockerfile as a text document which has all the commands that are needed for creating a Docker image. You can create an automated build that lets you execute multiple command-lines one after the other.
37) What is the process for creating a Docker container?
You can use any of the specific Docker image for creating a Docker container using the below command.
docker run -t -i command name
This command not only creates the container but also will start it for you. If you want to check if the Docker container has been created or not then you need to have the following command which will list all the Docker containers along with the host on which the Docker container runs.
docker ps -a
38) What is Ansible?
Ansible is an open source automation platform which can help you with configuration management, task automation and application deployment. Ansible uses SSH installed on all systems unlike other configuration softwares that work on agent architecture. Ansible also do IT orchestration where you run tasks and create a chain of events
that happen on different servers and devices. It is written on Python language which needs to be installed on the remote host. Ansible is very easy to set up yet it is a very powerful tool for software deployment.
39) Talk about Ansible architecture.
Ansible works on ‘agentless architecture’. It works by connecting to your nodes and pushes out Ansible modules to them which are small programmes. These modules are written to the resource nodes of the desired state of the system. With the help of SSH, Ansible then executes these modules and removes them when done. Since ansibles are based on agentless architecture, your pool of modules can dwell on any machine without requiring any server, daemons or databases. You just require terminal program, text editor and a version control system to keep a check on changes to your content. Ansible’s “authorized_key” is used to give directions as to what machines will use which hosts.
40) Give a comparison between Ansible and puppet.
A comparative study between Ansible and puppet is given below:
Ansible: Ansible is very simple to set up. It is a simple technology which is written in YAML language. It is based on agent-less architecture which doesn’t require nodes to locally install daemons. It facilitates automated workflow for continuous and hassle free delivery. Ansible doesn’t support windows. It comes with good GUI and CLI accepts command in almost every language.
Puppet: puppet is a complex technology as compared with Ansible. It is written in Ruby language. It works on easy installation and facilitates visualisation and reporting. It is not based on agentless architecture and unlike Ansible, puppet supports for almost all major operating systems. The prerequisite for using puppet is that the user must learn the puppet DSL language.
41) How to keep secret data in playbook?
If you want to keep secret data in your ansible content and still share it publicly, then you can use Vault in playbooks.
42) When should you test playbooks and roles?
In Ansible, tests are added either in new playbooks or to the existing ones. So, most of the testing job offers clear hosting each time and with this method, you need to make very minor changes to coding.
43) What is Ansible role?
The very first step in creating an Ansible role is creating its directory structure.
44) List some advantages of using Ansible.
Unlike other configuration management system, Ansible is the most sought after software applications these days. it offers the following benefits to its users:
Agentless- Its work structure makes use of agentless architecture. The nodes are not required to install and run background daemons to connect with a controlling machine.
Low overhead- due to agentless model, Ansibles reduces the overheads on the network by preventing the nodes from polling the controlling machine.
Secure and consistent- Ansible only uses SSH and Python on the managed nodes. This ensures safety and security. Also, Ansible ensures consistent environments.
Reliable- an Ansible playbook can be idempotent when written carefully. This prevents unexpected side-effects on the managed systems.
Good performance- Ansible delivers flawless performance. Though it is very easy to set up yet it is a powerful tool for deploying software applications using SSH.
45) What are the software prerequisites that must be met before Jenkins is installed?
The software prerequisites for installing Jenkins is that first you need to install Java Development Kit. It also needs you to install the Jakarta Enterprise Edition. Jenkins also comes with an embedded Jetty runtime that can be used if WebSphere or Tomcat is not available.
46) How to configure and use third-party tools in Jenkins?
These are some of the steps used for working with a third-party tool in Jenkins.
You have to first install the third-party software
You need to have the plug-in that supports the third-party tool.
You have to configure the third-party tool in the admin console.
You can then use the plug-in from the Jenkins build job.
47) How to take a backup of your Jenkins build jobs?
Within the XML configuration each Jenkins build is stored. When this folder is copied, the configuration of all the build jobs that are managed by the Jenkins master are backed up. If you can perform a Jenkins Git integration, then it is good. When you copy the contents of the folder, you will see that the build jobs described in the folder will be restored when the Jenkins server is started the next time.
48) what are the steps included in a Jenkins pipeline.
A complete Jenkins pipeline will include building a project from the source code, putting it through a variety of unit, integrating, testing for user acceptance and performance and then finally deploying the packaged application on an application server.
So the steps in a Jenkins pipeline will include:
Build
Test
Deploy
49) Explain what is the Jenkins tool?
Jenkins can be thought of as an open source automation tool that is used for continuous integration. You will be able to continuously test your software projects so that the developers will be able to integrate the changes to the project. You can also integrate with a large number of testing and deployment technologies.
50) State some of the advantages of using Jenkins?
Here are some of the most important advantages of Jenkins:
You will get an automated build report every time a change is made to the source code
You will be able to achieve continuous integration with agile methodology principles
You can automate the maven release project with a few simple steps
The bugs can be easily tracked at the early development stage.
51) What are the requirements for using Jenkins?
Here we list some of the requirements for using Jenkins:
A source code repository like a Git repository
A build script like a Maven script that is checked into the repository.
52) How to schedule builds in Jenkins?
Here there are some steps for scheduling of builds in Jenkins
First you should have a source code management commit
You have to complete the other builds
You have to schedule it to run at a specified time
Give a manual build request.

Wednesday, 10 April 2019

Python Standard Library for data science and ML



NumPy
NumPy (or Numerical Python) is one of the principle packages for data science applications. It’s often used to process large multidimensional arrays, extensive collections of high-level mathematical functions, and matrices. Implementation methods also make it easy to conduct multiple operations with these objects.
There have been many improvements made over the last year that have resolved several bugs and compatibility issues. NumPy is popular because it can be used as a highly efficient multi-dimensional container of generic data. It’s also an excellent library as it makes data analysis simple by processing data faster while using a lot less code than lists.
Pandas
Pandas is a Python library that provides highly flexible and powerful tools and high-level data structures for analysis. Pandas is an excellent tool for data analytics because it can translate highly complex operations with data into just one or two commands.
Pandas comes with a variety of built-in methods for combining, filtering, and grouping data. It also boasts time-series functionality that is closely followed by remarkable speed indicators.
SciPy
SciPy is another outstanding library for scientific computing. It’s based on NumPy and was created to extend its capabilities. Like NumPy, SciPy’s data structure is also a multidimensional array that’s implemented by NumPy.
The SciPy package contains powerful tools that help solve tasks related to integral calculus, linear algebra, probability theory, and much more.

Python Interview Questions And Answers Part 1



What is Python? 


  • Python is a high-level, interpreted, interactive and object-oriented scripting language. 
  • Python is designed to be highly readable. 
  • It uses English keywords frequently where as other languages use punctuation, and it has fewer syntactical constructions than other languages.
What are the supported data types in Python?

          Python has five standard data types −
  • Numbers
  • String
  • List
  • Tuple
  • Dictionary

What is the difference between list and tuples?

      
LISTTUPLES
Lists are mutable i.e they can be edited.Tuples are immutable (tuples are lists which can’t be edited).
Lists are slower than tuples.Tuples are faster than list.
Syntax: list_1 = [10, ‘Chelsea’, 20]Syntax: tup_1 = (10, ‘Chelsea’ , 20)

How is memory managed in Python?

      Python memory is managed by Python private heap space. All Python objects and data structures are located in a private heap.


What is dictionary in Python?

The built-in datatypes in Python is called dictionary. It defines one-to-one relationship between keys and values. Dictionaries contain pair of keys and their corresponding values. 

Example :
dict={‘Country’:’India’,’Capital’:’Delhi’,’PM’:’Modi’}
print dict[Country]





Git

1 git add ↳ It lets you add changes from the working directory into the staging area 2 git commit ↳ It lets you save a snapshot of currently...